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This new approach leads to a substantial improvement over the existing 
schemes, both regarding the quality of the results (particularly for 
low signal to noise ratios) and the computational efficiency 

2. We apply the Bayesian approach to the solution of several problems, 
some of which are formulated and solved in these terms for the first 
time. Specifically, these applications are: The reconstruction of 
piecewise constant functions from noisy data; the reconstruction of piece 
wise continuous surfaces from sparse and noisy observations; the recon¬ 
struction of depth from stereoscopic pairs of images and the formation 

of perceptual clusters. 

3. For each one of these applications, we develop fast, deterministic 
algorithms that approximate the optimal estimators, and illustrate 
their performance on both synthetic and real data. 

4. We propose a new method, based on the analysis of the residual process 
for estimating the parameters of the probabilistic models directly 

from noisy observations. This scheme leads to an algorithm, which has 
no free parameters, for the restoration of piecewise uniform images. 

5. We analyze the implementation of the algorithms that we develop 

in nonconventional hardware, such as massively parallel digital machines, 
and analog and hybrid networks. 
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Chapter 1 


INTRODUCTION 


A fund; mental problem in the design and analysis of systems endowed with 
perceptual a hlities is the construction of internal representations of the physical 
structures in the external world. The precise form of these representations is not well 
understood, and is the subject of much current research in Artificial Intelligence 
and Psy chol )gy. It is clear, however, that these representations should integrate 
prior ge neric knowledge about the physical properties of the external world with 
measuremen s from a number of different sensory modalities. Furthermore, in 
order t<t> be < ffectively action-oriented, the representations should provide compact 
descriptions )f the physical structures of interest at different levels of detail. 

This pnblem is not exclusive of biological perceptual systems; it arises 
wheneve r ini Donation from a set of sensors has to be processed, stored and retrieved 
in an elficie t way. Thus, it is of fundamental importance, for example, in the 
design o f coi lputer vision systems; in the reconstruction of subterranean geological 
structures fre m geophysical data and in the design of biomedical imaging systems. 
The mol ivat 3n for this thesis is to increase our understanding of the principles 
underlying tie process of integrating prior generic constraints with the available 
observe ons, for the construction of these representations. In particular, we will 
address the j roblem of reconstructing, in a way that is consistent with the available 
sensory data, the value of certain properties of the physical structure of interest over 
a discrete zed 'egion of space. 

To defin i these early perceptual processes in a more precise way, let us model 
the specific p -operties of the physical structure as functions / that map a (compact) 
region on n into k m . In the most interesting cases, / will be either a scalar 
(m = 1) or ; vector field (m = 2) defined on a two-dimensional region. This is 
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example, of the problems of image restoration and segmentation, and 
ery of: depth from stereo; lightness; shape from shading; and the 
of optical flow in computer vision, as well as many problems in the 
;eological structure from geophysical measurements. 

assume that the available data consists of several sets of qualitatively 
isurements {g\, that in general are modeled as: 

ff. = Hi(f, Hf, D 2 f .n.) 

motes the derivative of the property /; n t - is a noise process, and Hi 
itor (for example, in vision problems, the different measurements may 
x stereo disparity; brightness; color, etc.). We will also assume that 
ion is collected with different sampling patterns (Si,.. S M }, that is, 
ons gi are defined only on the finite set 5 t - C H. Since most physical 
:onsist of events that occur at a variety of scales, and in general, 
ely different scales have little influence on one another, the numerical 
of the behavior of a property over a range of scales can be used 
produce a physically meaningful hierarchical decomposition of the 
:ure into individual substructures ("objects") which can be subsequently 
symbolic forms that are more compact and easy to manipulate (see 
ind 1982; it is not surprising that there is psychophysical evidence 
e presence of a multiscale processing hierarchy in the human visual 
Campbell and Robson, 1977, and Marroquin, 1976). 

e solutions we are looking for consist on a family {/ a } of numerical 
>f the function / at different scales (indexed by a) at the sites of some 
(the finest scale representation should correspond to the best estimate 
value of / at the sites of L). To illustrate this idea, in figure 1-a we 
try pattern, and in figures 1-b through 1-e, its numerical representation 
y coarser scales. This family of descriptions was generated by the 
cribed in section 5 of chapter 4. 

il, the observation processes do not determine the value of / in a 
:able way (that is to say, these problems are ill-posed in the sense of 








Hadarr :ird; :ee Poggio and Torre, 1984). Therefore, the algorithms we are looking 
for shojld te able to regularize the problem by incorporating constraints on the 
solutio ^ gen irated by some prior knowledge about its general characteristics. 

Fi ally, because of the large number of variables involved, reasonable speed 
of perf:>rmai ce will usually require that these algorithms be distributed, and thus, 
efficiently in plementable in parallel hardware. 

1. Regt ariz; tion Analysis and Cooperative Algorithms. 

Ar:ong the most successful solutions to these type of problems are those 
that fonnula e them as variational problems, where the measurement and generic 
constra its a e separately represented in the following way: 

Le: us c jnsider the case of only one set of ’’perfect” measurements (i.e., with 
no nois:) g c ffined on the set 5, and suppose that the constraints that they impose 
on the :>Dluti< n can be expressed in the form: 

f s A(f>9)= 0 
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positive definite, real valued function that measures the incompatibility 
of the property / with the observations g. In general, the observations 
erfect, and so, we will only require that the error S s A{f,g ) be small, 
tre may be a large number of configurations that minimize the error, 
lique solution, an assumption about the global smoothness of / is 
y means of some positive definite, real valued function P(f,Df ,...) 
res the "jaggedness" of /. If both A and P are convex, the desired 
be the unique minimizer of the "energy" functional: 

u(f, 9) = j s A(f, f) + X jf P(f, Df,.. .) (1) 

parameter. 

)roach has been applied with varying degrees of success to the 
surface interpolation (Grimson, 1981b, 1982; Terzoupulos, 1983, 
utation of visual motion (Horn and Schunk, 1981; Hildreth, 1984a,b); 
lape from shading information (Ikeuchi and Horn, 1981); computation 
contours (Ullman, 1976; Brady et al., 1980; Horn, 1981); lightness 
and edge detection (Torre and Poggio, 1983). 

mt paper, Poggio and Torre (1984) have shown how functionals of 
equation (1) can be derived in a rigorous and systematic way using 
methods (Tikhonov, 1963; Tikhonov and Arsenin (1977); Wahba 
context / n P is called a stabilizing functional, and X, the regularization 

! functional (1) is specified, its minimization can be carried out by 
ational methods (Courant and Hilbert, 1953). Since usually one is 
lie value of / only at the discrete set of points L, the solution of the 
r-Lagrange partial differential equations can be obtained as the fixed 
ixation (cooperative) algorithm of the form: 

/! t+1) = Fi(f W ) * € L (2) 

irithm can be efficiently implemented in parallel hardware using a 
►cally connected processors (one for each site i), or even by some 
k (see Poggio and Koch, 1984). 
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eresting to note that it is also possible, and sometimes easier, to 
rior knowledge about the solution, and the constraints imposed by the 
directly in a cooperative network of a given form, without explicitly 
Dbal variational principle. This approach has been used by Marr and 
) for the stereo matching problem. We will have more to say about it 

' possible, in principle, to incorporate qualitatively different measure- 
single cooperative process, by a simple modification of the energy 


that we have M sets of measurements, and that each set g t places some 
i / (and/or its derivatives) which can be expressed by the functionals: 

f s M$i> = 0 i = 1 ,..M 

don will now be constructed as the global minimizer of the functional: 


u(f) = £ «,(/, g) f Si A + x f Q P(f, Df,...) 



rameters measure the relative weight we wish to assign to each set 
ents. 

e functions A, are convex, the solution will again be unique, and 
tion of (3) may be carried out by means of a cooperative network 
h has been used by Terzopoulos (1985) for the surface interpolation 
m the depth value / is known at some set Si of sites, and the slope 
erent set S 2 ). 

roach we have been discussing — which we will call the ’’standard 
i method” is very attractive: it provides a unified framework for the 
Df a variety of problems, and it leads to computationally efficient 
owever, it has some important limitations (some of them pointed out 
d Torre): 

in the assumption that the solution / is smooth over the whole 
1 is not justified. What is more commonly true is that fi can be 
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1. Flexibility. 


cd into a small set of disjoint connected regions, and that while / 
h in the interior of each of them, it has discontinuities along the 
ies between regions (which in turn are piecewise smooth curves), 
itation is a serious one, because very often the discontinuities of 
i the regularization methods tend to hide, are the most important 
the surface, in particular if one is trying to compute a symbolic 
tation for it. 

ming of the parameters of the energy functional is not always 
d they often have to be selected on a purely empirical basis. 

cases, the choice of the particular (often quadratic) form for the 
s A and P is arbitrary, and is determined mainly by the tractability 
niqueness problem for the solution, and by the simplicity of the 
minimization algorithm (in some cases, of course, there may be 
joretical or experimental considerations that justify this choice). 

traction between qualitatively different observations is purely 
One would like to be able to include more realistic non-linear 
f interaction. 

ic Formulation. 

:nt approach is to model the function /, whose reconstruction solves 
problem, as a random field that has to be estimated from a set of 
Dssibly ambiguous measurements. Within this formulation, one can 
:sian viewpoint (see Good, 1983), and assume that the best way of 
e prior knowledge about the nature of the solutionis in the form of a 
bility distribution Pf. This distribution, together with a probabilistic 
f the noise that corrupts the observations, allows one to use Bayes 
ipute the posterior distribution Pf\ g , which represents the likelihod of 
jiven the observations g. In this way, we can solve the reconstruction 
Inding the estimate f which either maximizes this likelihood (the so 
um a Posteriori or MAP estimate), or minimizes the expected value 
to P f \ g ) of an appropriate error function. This formulation has several 
er the "Standard Regularization" approach: 
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nple modifications in the prior probabilistic model for /, one can 
Drithms that reconstruct not only smooth, but piecewise constant or 
ntinuous functions. It is also possible to include explicitly into the 
knowledge about the geometry of the curves that bound the smooth 
about the discontinuities) of /. 


broach provides a general framework for the formulation of a wide 
:rceptual problems. We will show, for instance, how it can be used 
:egmentation; surface reconstruction from sparse data; modeling of 
ouping processes; stereo matching, etc. Furthermore, the incorporation 
ly different measurements into a single cooperative estimation process 
in a natural way: if the noise processes n u n 2 , .. .,n M associated with 
sasurements g u .. .g M are independent, the joint posterior distribution 
w ) will be simply: 


-P(/ I Ql) • • •S'm) — 


/M/) i/) 

nil, p(9i) 


erpretation. 

imeters that appear in the reconstruction algorithms that are derived 
>roach have a precise statistical interpretation (for example, the relative 
J evidence provided by each set of observations, will be determined 
nee of the associated noise process); also, the plausibility of the 
tions about the behavior of the solution can be explicitly verified 
5 sample functions of the random field defined by P/, by means of 
te Monte Carlo procedure. Finally, one can choose the precise loss 
se expected value will be minimized by the Bayesian estimator. 

:>nal Efficiency. 

ill see, if the random field defined by P f is Markovian (i.e., if the 
iependencies are local), the estimation algorithms will be distributed, 
be possible to implement them efficiently in parallel hardware. 








3. Goals of his Thesis. 

The ob ective of this work is to apply the probabilistic approach we have just 
described tc the solution of a general class of perceptual problems. In particular, 
we will: 

1. Present a class of random fields with local probabilistic dependencies, that can 
be used verj effectively to model the behavior of a wide variety of functions. 

2. Develop i ^propriate loss functions, and the corresponding optimal estimators for 
different cla: scs of problems. 

3. Develop j eneral distributed algorithms for computing these estimates. 

4. Apply th< above results to several specific problems, to illustrate the generality 
and practica value of this approach. 

5. Develop r lore efficient algorithms for each of these particular cases. 

We nov present a list of our main contributions: 

3.1. Suirnnar r of our Main Contributions. 

1. Opti nal I ayesian Estimators. 

Several esearchers have used Bayes theory and Markov random field (MRF) 
models for he restoration of piecewise uniform images. It has been implicitly 
assume:! by most of them that the maximization of the posterior probability 
(which leads to the Maximum a Posteriori or MAP estimator) is the best possible 
performance criterion. We introduce the use of different specific error criteria 
for the jdesig i of the optimal Bayesian estimators for several classes of problems, 
and propose a general procedure (which is based on some existing Monte Carlo 
techniques, s ich as the Metropolis algorithm) for approximating them. We show, 
both thqoreti rally and experimentally (in particular for the case of the restoration of 
piecewise uniform images) that this new approach leads to a substantial improvement 
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ting methods, both regarding the quality of the results (particularly for 
noise ratios) and the computational efficiency. 

lications. 

lout this thesis we present several examples of the application of the 
approach, and of the optimal estimation procedures that we have 
everal problems, some of which are formulated and solved in these 
first time. The results that we get show that this approach can provide 
nework for the integration of a variety of related perceptual tasks into a 
ative process. Also, these results represent, in several cases, a significant 
t over those obtained using existing schemes. Specifically, these new 
are the following: 

:e Interpolation. 

blem of reconstructing a piecewise continuous surface from sparse and 
ta is formulated using a Bayesian approach, using two coupled MRF’s 
l the behavior of the smooth patches, and of the curves (discontinuities) 
nd them. Although this type of coupled model has been used before 
context of the restoration of piecewise uniform, noisy images), its 
)n to this problem requires some non-trivial modifications: the local 
ons between the elements of the fields have to be redefined in an 
ate way, and the general estimation algorithm has to be modified to 
:omputationally feasible. The practical value of the resulting algorithm 
ited using both synthetic and real data. 

I Matching. 

blem consists in finding the corresponding points in two signals when 
itained from the other by shifting it by a variable amount. We study in 
ipecific instance: the reconstruction of depth from a stereoscopic pair 
s, and show how to formulate it using our general framework. The 
ince of the algorithms that we construct is also illustrated by means of 
and real examples. 

ition of Perceptual Clusters. 
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nstruction of Piecewise Continuous Surfaces. 

ase, we also develop a heuristic, deterministic scheme whose experimental 
ance is practically equivalent to that of the Monte Carlo procedure, 
>roves significantly on its computational efficiency. 

o Matching. 

pose a new algorithm for solving the stereo matching problem in 
nple cases. This scheme is based on the direct implementation of the 
istraints (generated by the probabilistic model) in a highly distributed 
tive network of a particular form: a "Winner-Take-All" network. We 
;orously that, for noise-free observations, the state of this network will 
; to the correct solution, and estimate the maximum number of required 
s (which is usually very small). The application of this technique to 
nstruction of the depth of real objects from stereoscopic photographs 
sed, and some modifications to the algorithm are introduced, which 
is to produce results whose quality is comparable to those of other 
the art” algorithms. 

Estimation. 

ntext of the estimation of two-dimensional, binary fields, we study the 
le parameters that characterize the field model and the noise are not 
ave to be estimated from the noisy observations, a situation that, so far, 
n treated. We present a maximum likelihood procedure, which based 
s of the residual ("innovations") process, permits the simultaneous 
the field and the parameters of the system. We apply this technique 
uction of an algorithm, which does not have any free parameters, 
struction of piecewise uniform images, and perform experiments to 
ts performance. 

plementations. 

rtant issue regarding the practical value of the algorithms that we 
*ir possible implementation in certain non-conventional hardware, 










sue 



lively parallel digital machines; hybrid and analog computers, etc. In 
on, we make the following contributions: 

e Carlo Procedures. 

yze the parallel implementation of the general Monte Carlo procedure for 
nating the optimal Bayesian estimators. We show that the convergence 
in widely used algorithms (such as the Metropolis and Heat Bath 
) cannot be guaranteed in this case. We justify the selection of an 
iate algorithm (the "Gibbs Sampler"), and present an estimate of its 
itional complexity. 

istruction of Piecewise Continuous Surfaces. 

allel implementation of both the modified Monte Carlo procedure 
deterministic algorithm that solve this problem are analyzed, and 
nputational complexity is estimated. We also propose schemes for the 
tion of hybrid (digital/analog) and analog networks that implement 
xedures, and perform digital simulations to evaluate experimentally 
formance. 

ition of Two-Dimensional Binary Fields. 

iputational complexity of the parallel implementation of the fast 
istic algorithm that performs this task, is estimated and compared with 
ie general Monte Carlo scheme. 

propose the adaptation of a class of analog networks proposed by 
and Tank (1985), so that we can obtain an approximation to the 
estimate of the field from the equilibrium state of this system. The 
nee of this scheme is assessed experimentally by means of numerical 
ns. 

rerview. 

is is organized in the following way: 
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It^ chap ter two we will introduce the basic concept of a Markov random field; 
show how tc compute the corresponding probability distribution, and present Monte 
Carlo procc dures for generating sample functions. In chapter three, we develop 
loss functio lals for the image segmentation and surface reconstruction problems, 
and derive t le corresponding optimal Bayesian estimators. We also present general 
algorithms or computing these estimates, and discuss their implementation in 
parallel han ware. 

These esults are applied, in chapter four, to the problem of segmenting 
piecewise c< nstant images given noisy observations. For the particular case of 
binary mag ;s, a very efficient distributed algorithm is developed, and we present 
a procedure for the case when the model and the noise parameters are not known, 
and have to be estimated from the noisy data. Also in this chapter, we show how 
these piincii les can be applied to the problem of computing the perceptual clusters 
that are fora ed in some dot patterns. 

In chap er five, we treat the problem of reconstructing piecewise smooth surfaces 
from sparse and noisy data, without blurring the boundaries between continuous 
regions; we discuss the use of Markov random field models to embody the prior 
knowledge ;bout the shape and location of the discontinuities, and show how 
to adap : the general reconstruction algorithms developed in chapter three to this 
probler. W< also develop a special purpose efficient algorithm for this case, and 
discuss its pc rallel implementation. 

Chapter six is devoted to the problem of the reconstruction of depth from 
stereoscopic images. As in the previous cases, we first present a probabilistic 
formula ion )f the problem, and extend the general methods of chapter three for 
implementin ; a solution. Then, we develop special purpose algorithms that improve 
the computa ional efficiency. The performance of these algorithms is illustrated 
using beth sj nthetic and ’’real” images. 

Finally, n chapter seven, we summarize our results, and suggest areas where 
future nsear h may be fruitful. 
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Chapter 2 


LOCAL SPATIAL INTERACTION MODELS 


)n. 

to the success in the use of the probabilistic (and in particular, Bayesian) 
the solution of the class of reconstruction problems in which we are 
our ability to find a class of stochastic models (that is, random fields) 
following characteristics: 

babilistic dependencies between the elements of the field should 
illy localized. This condition is necessary if the field is to be used 
l surfaces that are only piecewise smooth; besides, if it is satisfied, 
instruction algorithms will be distributed, and thus, efficiently 
ntable in parallel hardware. 

s should be rich enough, so that a wide variety of qualitatively 
behaviors of the desired solutions can be modeled. 

tion between the parameters of the models and the characteristics 
rrcsponding sample fields should be relatively transparent, so that 
sis are easy to specify. 

1 be possible to represent the prior probability distribution Pf 
\ so that Bayes theory can be applied. 

1 be possible to specify an efficient Monte Carlo procedure for 
ig sample fields from the distribution, so that the ability of the 
represent our prior knowledge can be verified. 

ely, there is a class of models that satisfies these characteristics: the 
ovian Random Fields (MRF) on lattices. We will describe them in 
and we will also show how they satisfy the required conditions. To 
vill need two important results: the Hammersley-Clifford theorem, 
£d to conditions (iii) and (iv), and the Metropolis and Gibbs-sampler 
hich will permit us to satisfy condition (v). 
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andom Fields. 

cept of a MRF is a direct extension of the concept of a Markov process 
nensions and originated in the work of Ising (1925) on the construction 
:>r ferromagnetic phenomena. The definition for a two dimensional 
/IRF was introduced by Wong (1968), following Levy (1956) (see also 
L968), and in intuitive terms it says that a random field is Markovian 
>sed curve that separates the space into two regions, the knowledge of 
the field along the curve, makes the field in these regions mutually 

seful for our purposes (since usually we will be interested only in 
g the field at the sites of a regular lattice) is the definition of a discrete 
leralization of the concept of a Markov chain. A discrete Markov 
on a finite lattice is defined as a collection of random variables, which 
o the sites of the lattice, whose probability distribution is such that 
lal probability of a given variable having a particular value, given the 
rest of the variables, is identical to the conditional probability given 
‘ the field in a small set of sites, which we will call the neighborhood 
site. In formal terms we have the following (see Geman and Geman, 
o Woods, 1972 for an alternative definition): 

e a finite set of N sites, and G = {G a , s e S} be a neighborhood 
i.e., a collection of subsets of S for which: 

3r all s € S. 

' and only if r e G a , for all r, s e S. 

{F a , s £ 5} be any family of random variables indexed by s £ 5, and 
simplicity, that these variables take values on some finite sets {Q a } 
n can be extended, with some technical modifications, to the case of 
ate space). We will call any possible sample realization / : 

(/»!>•••) Isn ) 1 fai £ Qsi 
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Figure 2. Sites 1, 2, 3 and 4 are the neighborhood of site j 



1 of the field. Let f2 be the set of all possible configurations (i.e., the 
* and let P be a probability measure in n. F is a MRF with respect 

) > 0 , for all / € H ( (F = /) denotes the event: ( F a = f 9 for 
)• 

9 \ Pr = fr r 3 ) =■ P(Fg = fa \ F r = f r r £ Ca). 

from this definition , that if the size of the neighborhoods is small, 
itisfy the first condition we required from our class of models. The 
tion of a MRF from this definition (i.e., in terms of the conditional 
tiowever, is not very convenient because of the following reasons: 

i functions defining valid conditional distributions for a MRF cannot 
xarily, since they have to satisfy a set of consistency conditions (that 
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where Z is i 
U(f) is of th 


ayes’ rule; see Besag, 1972), and are, in general, very difficult to specify 
Dndly, although the joint probability distribution Pj can be uniquely 
from the conditional probabilities, its computation is, in general, a 
rivial task. Finally, there is no obvious intuitive relation between the 
onditional probability distributions and the qualitative behavior of the 
• 

:ome these difficulties, we need an alternative way of defining a MR.F. 
as follows. 

Gibbs Equivalence. 

: need the following definition: 

system of neighborhoods on a lattice, we define a "clique" C as either 
Dr a set of sites of the lattice, such that all the sites that belong to C are 
f each other. For example, on a 4-connected lattice (Fig. 2), the sites 
form the neighborhood of site ;, and the cliques are sets consisting 
de sites, or of two (vertically or horizontally) adjacent sites (nearest 
;ee Fig. 3). 

It we are looking for is contained in the Hammersley-CIifford theorem 
and Clifford, 1971) which states that if F is a MRF on a lattice 
ct to the neighborhood system G, the probability distribution of the 
s (sample functions) generated by it will always have a definite form, 
of a Gibbs distribution: 

P/U) = je-WP 

normalizing constant, j3 is a parameter, and the "Energy function" 

: form: 

W = EW) 

c 
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is a MRF, we have that 


P{ 0) > 0 
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so that (he quantity 


is Well defined. 


pu) 

P(0) 


with 


step is to note that we can always write: 


p{n 

m 


=,w> 


QU) = E /.<?,(/.■) + E E /./><?,/(/,. /y) +.. 


* 3 


If • • */n) 


for some furllctions G{, G tJ ,.... 


Now, fo 
/hi as being 


Using B 


■ any configuration / and any selected site i, we define the configuration 
qual to / everywhere, except possibly at site i, where it is equal to 0: 

/I I = {/l, . . */t—1> 0, fi+lt • • •> /r»} 

iyes rule we find that: 

P[f) 

P(f®) P(o I /y,;V 0 -P(/y,;V*) 

^(/* I //ii 7^ *) mm 

= >(6|7y f ,vo = exp[Q(/) " e(/ )] = 

= exp[/,•<?,(/,) + Y1 fifjGijift, fj) + .. •] 


Note thht blcause of the Markov property, the above quotient of conditional 


probabilities 
of site i. 


an depend only on the value of / at those sites which are neighbors 


Now, su >pose l is not a neighbor of i, and consider a particular configuration / 
which isi eqiu 1 to 0 everywhere, except at sites i and l. By the above considerations, 
we have: that 

QU) - Q(f ii] ) = f.GiUi) + fUiGuUi, fi) 
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lilar reasoning, one can show that f rn ) can be different 

if the sites i, j ,..m are neighbors of each other, i.e., if they belong to 
jue. The proof is completed by defining: 


• • •> frn) fit • • •fm(*i,...m{fit • • 'frn) 


portant to note that whereas the functions defining valid conditional 
for a MRF cannot be chosen arbitrarily, the form of the potentials 
stricted in any way, and can be used freely to specify the required 
the field / (which is what one does in practice). The relation between 
als and the conditional probabilities is given by the following formula 
vs from Bayes rule): 


Fi = fi 


. , .. = e*p[-i Ec=.-ec v c(f)} 
i,} * 1 exp[-iEc’.ec Vfc(/«)1 



he set of allowable values for the state of F t -, and f q is the configuration 
al to q at site i, and coincides with / everywhere else. 

-e other ways of representing certain classes of MRFs. For example, 
■) has shown that every homogeneous Gaussian MRF defined on a 
satisfies a difference equation of the form: 


fnm ) > ^ klfn—k,m—l T ^ nm 
D(P) 


the value of the field at site nm and u is a (non-white) stationary 
whose autocorrelation function satisfies: 

( c, m = n = 0 

^mn c > (771,71) G D(P) 

0, elsewhere 


D(P) = {(&, /) : 0 <k 2 + l 2 <P 2 } 
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3. Generatior 

3.1. The Met 


r , fc, if n = k and m = l 

E[fnmU k ,\ = \ 

10, otherwise 

h kl can be interpreted as the coefficients of the linear minimum mean 
estimator of f mn given its neighbors out to distance P, and u as the 
ror. 

resentation (called a "Conditional Markov" (CM) model by Kashyap 
nen be used to generate sample functions (Woods, 1972 also presents 
based on the discrete Fourier transform, for the generation of sample 
f the field u , and for the computation of the joint distribution for /). 
atisfies a difference equation of the form: 

fnm = y! ^ klfn—k t m—l “h ^nm 
D(P) 

} are independent random variables, is called a "Simultaneous 
e" (SAR) model by Kashyap ( a similar representation can be obtained 
i exponential autocorrelation functions; see Habibi, 1972). Although 
that for any homogeneous SAR model it is possible to find a MRF 
e spectral density, albeit with a different neighborhood structure, it 
very difficult to compute the joint distribution explicitly from the 
itation. On the other hand, the Gibbs representation has the following 

fectly general: it applies to discrete valued fields, and it can be 
[leralized to the case of continuous valued ones. 

to generate sample functions from the distribution (we will discuss 
is for doing this in the next section). 

e posterior distribution is also a Gibbs measure, the optimal 
i can be obtained directly from the posterior energy function. 

s reasons, this is the representation that we will adopt 
of Sample Configurations of MRFs. 
opolis and Gibbs-Sampler Algorithms. 
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iiest successful Montecarlo procedure for the generation of sample 
MRF’s was developed by Metropolis et al. (1953) for the numerical 
of thermodynamic properties of many-particle systems in thermal 
To describe it, let us consider a system with N particles, each of which 
iy one of a finite number of allowable states. Let fj denote the state of 
:le (we will refer to the N -vector / as the global configuration of the 
let U(f) be the corresponding energy. 

lie idea of the algorithm is to construct a Markov chain whose states 
to the global configurations of the system at discrete time intervals 
5 a well known fact, from statistical physics, that when the physical 
hermal equilibrium at a given temperature T, its configurations will be 
ccording to the Gibbs measure: 

<0 = ex ?[——7jp-] ( 2 ) 

e want n (/) to be the invariant measure for our chain. If the chain is 
if it is possible to go between any two states in some fixed number of 
vill be the unique vector satisfying: 

7rPc = 7T 

the transition matrix of the chain (see Kindermann and Snell, 1980). 

ice a system in equilibrium looks the same if we reverse the time 
require that the associated chain be reversible, that is, 

r(/(n + 1) - j | f(n) = i) = Pr(/(n - 1) = j | /(n) = i ) 

■ chain, reversibility is equivalent to the ’’detailed balance” condition: 

* (f)Pc(f,n = *(r)Pc(f',f) (3) 

/’ are any two global configurations. This condition means that, if 
a large collection of isolated, identical systems, each one in thermal 
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equilibrium|at the same temperature (the so called "Canonical Ensemble"), the 
number of systems going from state / to /’ must equal the number of systems going 
from / to . . This condition is also sufficient for the convergence of the chain to 
the des; red jibbs measure. 

Hie alg 3rithm proposed by Metropolis generates a regular chain that satisfies 
(3). It is as I allows: 

Si ppos : that we visit the particles of the system (i.e., the sites of the lattice) in 
some nndoi i sequential order (for example, we choose the next site to be visited at 
random witl uniform distribution). When a particle j is visited, we update its state 
as follows: 

(i) Choose a new state 7y randomly from the set of allowable states using a 
im fornly distributed random number. 

(ii) Compu e the increment in energy AEj that results from moving the state 
of the j h particle from /y to /y. 

(iii) If AEj < 0, make the move, i.e., set /y = /y. 

If AEj > 0, generate a new random number r, uniformly distributed 
betweei 0 and 1. 

If r < < set fj = /y. 

If r > t ~ AI ^}/ T t leave /y unchanged. 

If we c enote by q(f, f) the probability of proposing the state f when the 
system is at tate / (i.e., the probability of visiting particle j , and selecting the state 
fj for it; nc te that q must be a symmetric, irreducible stochastic matrix, so that 
?(/> 7) == qt /), by construction), we have that 

Pc{f, 7) = q{f> 7) min(l, e-* u / T ) 

PcCf, f) = 9(7, /) rnin(l, e AU / T ) 

where 

AU = U(f) - U(f) 

Therefore, if AU <0, 

Pcif.f) = ?(/,/) and PcO.f) = q(f,f)e AU / T 
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;nt construction, called the "Gibbs sampler" has been proposed by 
Geman (1983) (see also Besag (1972)). In this scheme, too, at each 
one site is modified; its new state, /y is selected at random from the 
istribution given by equation (1). These authors show that provided 
keep visiting every site, (i.e., that we update its state "infinitely often") 
chain is ergodic, and its invariant measure is given by (2) (note that 
s not required in this case). It is not difficult to see that for binary 
nethod is equivalent to the heat bath algorithm. 

1 Mechanics Interpretation. 

n intuitive grasp on the way these algorithms work, it is useful to 
jsults from statistical mechanics (see, for example, Reif, 1965). When 
: system (i.e., a system with a large number of degrees of freedom) 
equilibrium at a given temperature T, its state / will be such that 
e energy F is minimized. The relation between F(f) and the internal 
)f the system is given by: 

F(f) = U(f) - TS 


31 










where 


tropy S is: 


S = In n (U) 


and n([/) is 
equal to U. 

From tl 




with 


the total number of feasible configurations of the system with energy 

is relation it is clear that at high temperatures, a system in equilibrium 
disordered, high energy configuration (which will have a high value of 
low temperatures, the dominant tendency will be towards low energy 
robability distribution of the equilibrium energy is given by: 

Pu{U) = (U) 

i constant. Since Q(-) is a rapidly increasing function of U , and the 
onential is rapidly decreasing, Pu will be sharply peaked around a 
Using the fact that Q(U) = 0(E/ n ), where n is the number of degrees 
f the system, one can show that the relative width AU of this peak will 
proportional to the square root of n\ 


AU 1 



holds, in fact, not only for the energy, but for other related 
lical properties as well). This means that, for large n, the Metropolis 
lpler) chain will generate (asymptotically) configurations whose energy 
:o U*(T ), which is an increasing function of T. 

rate this, let us consider a binary system on a four-connected square 
energy function is given by: 

1 c 

Vclfi.fi) = {- U 
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1.2 (righ 


Ficiure 4 
) timl 


Sample patterns of the two-dimensional Ising model at 0.8 (left), 1.0 (center) and 
Is the critical temperature. 


where 
dimeni 
contrib 
lattice 

In 


<t? ranges over all the nearest neighbor cliques of the lattice (this is the two 


onallIsing model with ’’free boundaries" — since the only interactions that 
ute ti the energy are those between elements of the field that belong to the 
wHich we will later discuss in detail). 


figule 4 we present typical equilibrium configurations generated at three 
different ten peratures using the Metropolis algorithm with random updating order. 
The temper* tures used correspond to 0.8,1.0 and 1.2 times the critical temperature 
for thiii mo< el (the critical temperature is defined as the maximum value of the 
temperature for which the effect of fixed conditions at the boundary of a square 
lattice is felt it the center, no matter how large the lattice is. For the two-dimensional 
Ising nfodel It equals 2.273). 

the |mit of very large lattices, the equilibrium energy per spin (which is 
to the total length of the boundaries between "black" and "white" 


In 

propor: 

regions) 


onall 


is given by (see Wannier, 1959): 


= ~COthi[l ± i(l - a 2 ) 1 / 2 tf(a)] 
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Figure 5. Equilibrium values of the energy (a) and average density (b) for an infinite Ising 
net (from Wan tier, 1959) 6 

where we tak 2 the + or - sign, above and below r c , respectively. A; is the Boltzmann 


constant; a it given by: 


a 


2sinh(l/r) 

cosh 2 (l/T) 


and K(-) is the complete elliptic integral of the first kind (see, for example, 
Hildebrand, .976). 


The aveilige density of black" elements can be computed by the expression: 

C4T) “ i 11 + Sf$m |ital ’ (1/I ' 1 " 1 |l ‘" 

The shaf e of these functions is illustrated in figure 5. 


From a lualitative viewpoint, one can see that the temperature, which is the 
only free parAneter of this model, controls the granularity (average cluster size and 
cluster densiti) of the sample patterns. 


Other exs mples of patterns generated with these algorithms (or some variations 
of them) may be found in Cross and Jain (1983) and Hassner and Sklansky (1980), 
where they ar : used as models for texture; in Geman and Geman (1983) as models 
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for piecewis; constant images, and in Grenander (1983), where they are used to 
produce mo e complex patterns. 

3.3. Continu >us Valued State. 
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the two algorithms presented in section 3.1 can be generalized to the 
he state of each particle can take any real value on a compact set 
l interval) at the expense of their computational efficiency. A different 
t seems promising is based on the fact that a vector / which obeys the 
ferential equation: 

df — -gra dU[f)dt + \frrdw (4) 

vector Wiener process with unit variance (a collection of independent 
)tion processes), will be, under suitable smoothness conditions on 
d asymptotically (as t t °°) according with the Gibbs measure (1) 
ler 1984; Geman and Hwang, 1984). This means that we can use a 
lulation of (4) (see Wong and Zakai (1965)) to generate the desired 
approach has two interesting advantages, that result from the fact that, 
al simulation, the increments dw are approximated by independent, 
;tributed Gaussian random variables: 

need to generate Gaussian random numbers, for which efficient 
is exist 

can be updated at the same time, so that efficient parallel 
citations can be adopted. 

mbility distribution of the configurations generated by the system at 
le can, in principle, be obtained by solving an appropriate system 
erential equations (i.e., the Kolmogorov equations; see for example, 
lylor, 1981); this will not be practical in most cases, however, so that 
ivergence of this algorithm will have to be assessed in an experimental 
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We will] 
section can 


now describe how an extension of the techniques presented in this 

t e used to find the global minimum of arbitrary energy functionals. 
iow in the next chapter, this method will be particularly useful for 
in the variational principles which represent the Maximum a Posteriori 


minimization 


estimated solution to a reconstruction problem. 
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4.1. Discrete 

Geman < 
at the rate: 


Annealing and Global Minimization. 


Simulated annealing is a new technique, developed by Kirkpatrick et al (1983) 
on of combinatorial optimization problems. It is based on the idea 
functional of N variables, each of which can take values on some 
be considered as the energy function of a physical system whose state 
:o a particular value of these variables. Therefore, we can use, say, 
is algorithm to generate, at any given "temperature” T (which now 
rameter of the optimization process) samples from the corresponding 
e. Since as T | 0 this measure converges to an impulse (or set of 

R esponding to the state (or states) of minimum energy, the state of the 
mal equilibrium at zero temperature will correspond to the value of 
zes U(f) globally. 


Ulr 


us difficulty, however, is that attaining thermal equilibrium might take 
|ne at low temperatures. Kirkpatrick’s idea was to start at a relatively 
ure (where thermal equilibrium is reached very fast), and then, to 
system, until "freezing" occurs and the state stops changing. 


tile 


Valued State. 


where n is th : 
Gibbs sample ) 


Geman (1983) were able to show that if the temperature is lowered 


T = 


log(n + 1) 


(5) 


number of iterations, and C is a constant, this algorithm (using the 
will in fact converge (in probability) to the set of states of minimal 
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energy. The) also showed that this chain is asymptotically ergodic in the sense that 
for any real 'alued function Y of the global state at time t, f(t ), we have: 

i™ l n t nm) = /„ Y { .)<tp,(u) 

' ' V 

where n is he set of allowable global states. This means that we can use time 
averages to e timate ensemble averages. Similar results have been obtained by Gidas 
(1984) for th j Metropolis and heat bath algorithms. 

The mil imal value of the constant C in equation (5) for which convergence 
can be guara iteed has not been determined in general. The value found by Geman 
and Geman s: 

C = iVA 

where N is he total number of sites in the lattice, and A is the largest absolute 
difference in energies associated with pairs of global configurations that differ at 
only one site This value, however, is too large to be of any practical use in most 
applications. 3idas (1984) has shown that if U has not more than two local minima, 
C can be cor iputed as: 



where A’ is tl e minimal energy change between a local minimizer and a neighboring 
(in the sense that it differs at exactly one site) configuration. He also conjectures 
that this expi jssion holds in general, but this result has not been confirmed. 

In a rece it paper, White (1984) characterizes the initial annealing temperature 
in terms of th ; standard deviation of the ’'density of states” (the number of possible 
states of the s; stem, per unit energy, for each value of the energy) when this function 
is approximal ^ly Gaussian (which seems to be the case for a large class of systems). 
In some parti ular cases this value can be determined analytically from the structure 
of the problei l, but in general, it has to be computed numerically from a simulation 
of the system at high temperature. 

For the < lass of systems in which we are interested, we have found, by a trial 
and error pro edure, that a value of C equal to 1.5 times the natural temperature of 
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the system ( ,e., the temperature associated with the Gibbs distribution of the prior 
MRF model produces a reasonable convergence behaviour (of the order of 500 
iterations), b it clearly, more research, both theoretical and experimental is needed 
in this area. 

Another important factor which determines the computational efficiency of 
simulated an ealing is related to the difficulty in computing the increment in energy 
A Uj associat d with a change in the state of the j th variable. If the energy function 
comes from t le probability measure of a MRF, the computation of A Uj will require 
only the state > of the variables in the neighborhood of j. Suppose now that we color 
the sites of th ^ lattice in such a way that any two neighbors will always be of different 
color. In a p irallel implementation we can, in principle, update the states of all 
the sites that are of the same color in a simultaneous way. The minimum number 
of colors nee led to satisfy this condition is called the "Chromatic Number" of the 
graph that d iscribes the neighborhood structure of the MRF, and it is bounded 
below by the size of the largest clique of the system. This number, then, determines 
the minimun number of steps that are needed in a parallel machine to update the 
state of the v hole lattice. We will analyze these implementations in more detail for 
some particu ar examples in the next chapters. 

4.2. Continue iis Valued State. 

All the i /ailable convergence results for the annealing algorithm hold only for 
the case whe e the set of allowable values for the state of each variable is finite. If 
this set is inf nite, but compact, we can still use these results to find approximate 
solutions by discretizing it. However, the computational complexity will increase 
as we increaj e the resolution of this discretization. An attractive alternative is to 
generalize th; approach discussed in section 2.2 by making T in equation (4) 
time depend* nt A convergence proof for this modified scheme, for smooth energy 
functions tha satisfy appropriate boundary conditions, can be found in Geman and 
Hwang, 1984 
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e presented a class of probabilistic models with local dependencies 
present prior generic knowledge about the solution of a reconstruction 
e class of MRF's on finite lattices. We have seen how they can be 
)ecified by defining arbitrary "potential functions" which are supported 
es of the associated neighborhood system. It is thus easy to define 
-Ids with a wide range of different behaviors. For example, if the only 
dge that we have is that the reconstructed surface should be piecewise 
may use a 4-connected lattice with Ising potentials: 

r-l, if \i-j\ = l and /,• = fj 

V c (fi, fj) — 1 1 , if \i — j\ = 1 and /,• ^ fj 

lo, otherwise 

ase, the natural temperature of the system will index a one parameter 
ds with varying degrees of granularity. 

surfaces can be modeled using the same neighborhood system, but 
c potentials: 



if \i — ;| = 1 
otherwise 


nplicated, non-isotropic patterns can also be modeled, using slightly 
Drhoods (as in Cross and Jain, 1983). Also, as we will see in chapter 
iate choice of the lattice and the neighborhood system, permits one 


7 to model sets of piecewise smooth curves on the plane. Using this 
it is possible to model the behavior of a piecewise smooth function 


two-dimensional lattice (a "piecewise smooth surface") by coupling 
Dne for the smooth portions, and another for the curves that bound 


ed how the probability distribution of the configurations generated by 
e same form as the one associated with a macroscopic physical system 
uilibrium, so that one can use Monte Carlo procedures that simulate 
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of such systems to generate sample functions of arbitrary MRF’s. The 
•roperty of the models imply that the computations performed by these 
ire local in nature (the updating rule for each site depends only on 
its neighbors), so that, in principle, efficient parallel schemes can be 
their implementation. We will examine this question in detail in the 
, where we discuss the use of MRF models and Bayes theory for the 
don of reconstruction problems. 


40 



Chapter 3 


OPTIMAL BAYESIAN ESTIMATORS 


1. Introducti >n. 

The use of the Bayesian approach for the solution of reconstruction problems 
requires the development of the following items: 

(i) A prior probabilistic model for the functions to be reconstructed. 

(ii) Stochas ic models for the observation processes. 

(iii) Approp date loss (error) criteria. 

(iv) Estimat >rs that are optimal with respect to (i), (ii), and (iii). 

(v) Efficien algorithms for the computation of these estimates. 

In the Drevious chapter, we discussed item (i), and presented a class of 
probabilistic models that can be used very effectively to encode prior generic 
constraints a)out the solutions of reconstruction problems. In this chapter we will 
develop the emaining necessary ingredients that are necessary to perform optimal 
reconstructic ns in the general case. 

First of ill, let us formulate the class of problems of interest in a precise way, 
and present general stochastic model for the observation process. 

2. Problem I cumulation. 

We me tioned in chapter 1 that there is an important class of perceptual 
problems wl ose solution can be found by reconstructing a function / : R n 
R m on a fin te set of points that lie inside a compact domain 0 C R n . Although 
the methods fiat we will develop are, in principle, perfectly general, for the sake of 
clarity we wi 1 confine ourselves to the important particular case when n — 2 and 
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ire, therefore, interested in reconstructing the value of a function / at 
the N sites of a lattice L (we will denote die value of the function at 

/ti¬ 
le Model for the Observations. 

tssume that we have a set of observations g on a subset S of the sites 
it the process by which these observation are obtained can be modeled 

9j — n j) , J <= S’ ( x ) 

ffy (.) is an operator with local support that represents some kind of (in 
-invertible) degrading operation (such as blurring); O' is an operation 

iith respect to n, (so that n y = *-'{*, H,if)))- * re P resent ’ 
e, noise addition or multiplication followed by a memoryless non¬ 
formation. nj represents a scalar noise process with known probability 
p We will assume that nj is independent of n,-, for all t ^ j, and 

is independent of /. 

/, the conditional probability distribution for the observations P, lf will 

r. 

^1/(9;/)= II 

its 

ling that P m (n,) > 0 for all *, and all possible values of n u we can define 
ns by: 

$i(f, gi) = — In P n i(^f l {9i, Hi (/)) ^ 

can write the conditional distribution as: 

P g \f{g) f) = ex Ph £ *»'(/> ^ 

ies 

nple, consider the case of additive , zero mean white Gaussian noise. We 


HAD = Si 













Pni{x) = 


^(a, b) = a -f b 

1 ' 2 /o_2i 


\/27r 


exp[— 


7TOT 


1 


^1/(9.' /) = n — ex P[ — (/*' - 9.) / 2<r 1 

t€5 yfi/RO 

= exp[- X^{ln(\/27rcr) 4- ^(/t “ 9*) 2 }] 


i es 


2.2. Posterio Probability Distribution. 

Since w| are using a MRF model for the field /, its prior distribution will be 
of the form: 


with 


Pf if) = Jr exp[-^f/ 0 (/)l 


Uo(f) = E Mf) 

c 


(4) 


where C ran ;es over the cliques of the neighborhood system of /. 
Using B lyes rule, we find that the posterior distribution is: 


tHe 


Using 
a constant fc 
also follow a 


with 


Where Z P id 


expressions (3) and (4) for P f and P g \ f , and recognizing that P g (g ) is 
J- a given set of observations, we get that the posterior probability will 
Gibbs distribution: 


Pf\,U< 9 ) = 


Pf(f)P, |/(9; /) 

PM 


•P/Ij(/; 9 ) = Yp ex P[ _u H/; 9 )] 


Vr(f-,g) = ^Uo(f)+Z *(/•*) 

J o , e s 

a constant, and the functions 4> t are defined by (2). 


(5) 


( 6 ) 
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We can tow provide a physical interpretation of the posterior distribution, by 
t mt, while the prior distribution (4) describes the behavior of a free field 
<| uilibrium (see section 3.2 of chapter 2), the distribution (5) describes 
of the same field coupled with a fixed (but spatially varying) external 
vmlue is given by g. The functions $ t , whose magnitude depends on the 
can then be interpreted as the coupling strengths between the two 
ibupled system is also Markovian, and if 


considering 
in thermal e 
the behavior 
field whose 
noise varianq 
fields. This o 


its neigh born 


The im 
in the follow! 
by observing 
question in d 
we are interested in. 


3. Cost Func 


The Bay 


Hi(f) = //,-(/,•) for all i € S 


pod structure will be identical to that of the original field. 


fjprtance of this interpretation lies in the fact, which will be proved 
ng sections, that the optimal estimate for / can be obtained simply 
the equilibrium behavior of this coupled field. Before considering this 
"tail, let us define the appropriate cost functionals for the applications 


ionals. 


f ;sian approach to the solution of reconstruction problems has been 
veral researchers. In most cases, the criterion for selecting the optimal 
estimate has been the maximization of the posterior probability (the Maximum a 
Posteriori or VlAP estimate). It has been used, for example, by Geman and Geman 
(1984) for th s restoration of piecewise constant images; by Grenander (1984) for 
pattern recon jtruction, and by Elliot et. al. (1983) and Hansen and Elliot (1982) for 
the segments ion of textured images (a similar criterion — the maximization of a 
suitably defir ed likelihood function — has been used by Cohen and Cooper (1984) 


for the same 


purposes). 


Since thp use of this criterion defines the optimal estimator as the global 
minimizer oi the posterior energy U P (equation 6), it is closely related to the 
standard regularization method that we discussed in chapter 1. Indeed, if we assume 
quadratic potentials for the prior MRF model, the term U 0 {f) corresponds to a 
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iness assumption (the ’’stabilizing functional”), and if the observations 
by additive Gaussian noise, the term £ 4> t (/, &•) will also be quadratic, 
ill have a unique minimum. For more general prior and observation 
vlAP estimator may be considered as an extension of the standard 
approach. Thus, the variational principle proposed by Blake (1983), 
ragmatic basis, for the reconstruction of piecewise constant images is 
o the one derived by Geman and Geman (1984). Even in this case, 
precise probabilistic formulation in the latter case is preferable, since 
precise interpretation of the parameters, and a practical means for 
adequacy of the prior assumptions (via the experimental analysis of 
• 

other cases, a performance criterion, such as the minimization of the 
1 error has been implicitly used for the estimation of particular classes 
ixample, for continuous-valued fields with exponential autocorrelation 
rupted by additive white Gaussian noise, Nahi and Assefi (1972) and 
i have used causal linear models and optimal (Kalman) linear filters 
s reconstruction problem. 

mization of the expected value of error functionals, however, has not 
an explicit criterion for designing optimal estimators in the general 
show that this design criterion is in fact more appropriate in our case, 
ing reasons: 

s one to adapt the estimator to each particular problem. 

)ser agreement with one’s intuitive assessment of the performance 
mator. 

3 attractive computational schemes. 

now propose design criteria for two particular problems: image 
and surface reconstruction. 


3.1. Error Criierion for the Segmentation Problem. 
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that fi does 
example, the] 
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error e 9 as: 


|sses. Let / t - denote the class to which the i th element belongs. The 
problem is to estimate / from a set of observations {gi,.. .,g p }. Note 
]|iot necessarily correspond to the image intensity. It may represent, for 
texture class for a region in the image (as in Elliot et. al., 1983), etc. 


t able criterion for the performance of an estimate f is the number of 
are not classified correctly. Therefore, we define the segmentation 


where 


3.2. Error Cn 


on finite seta 
of an image 
should be c 
total square 


N 


«.(/. /) = £ (i - s{fi - /.)) , h, fi e Qi 

i=i 


(7) 


%) = 


ri, if a = 0 

.0, otherwise 


( 8 ) 


terion for the Reconstruction Problem. 


In this apse, we also consider a field / with N elements which can take values 


{Qi }, but now we assume specifically that /,• represents the intensity 
'or the height of a surface) at site i. This suggests that an estimate / 
nsidered "good” if it is close to / in the ordinary sense, so that the 
error: 


will be a real 


Let us ri 


bw derive the optimal estimators for these error criteria. 


To deri4 
first present 
which states 


N 


*r{f,f)= E(/.-/.) 2 
*=1 

onable measure for its performance. 


(9) 


4. Optimal Bayesian Estimators. 


e the optimal estimators with respect to the criteria stated above, we 
|he general result (which can be found, for example in Abend, 1968) 
hat if the posterior marginal distributions for every element of the field 


are known, |he optimal Bayesian estimator with respect to any additive, positive 
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precise terms, we will consider cost functionals C(f,f) of the form: 


c(f, f) = £ CiUi, /,) 

>'€£ 


( 10 ) 


(= 0 , 

;iM) {> 0 , 


if a — b 

if a 7 ^ b , for all i 

We will assflme that the value of each element / t of the field / is constrained 
to belong tc some finite set Qi (the generalization to the case of compact sets 
is straightfoi ward). The Optimal Bayesian estimator f with respect to the cost 


functional C 
possible / an 


We now have: 


Theorem 1: 


The optimal 
C can be foill 
element, i.e.. 


for all sj^q 


is defined as the global minimizer of the expected value of C over all 
d g : 

£(/*) = / c{f, g) = 


inf f fg c (f, /mjf, 9 ) 


( 11 ) 


pstimate of a field / with respect to the positive definite cost functional 
nd by minimizing independently the marginal expected cost for each 


/*• = 9 • ]C c i( r > v) p i( r I sO ^ J2 c '*( r » s ) p *( r I 9) 

reQi r£Qi 

and for all i G L. 


Pi(r | g) is thp posterior marginal distribution of the element i: 

p i( r I s) = p n »(/; s) 

f:fi —r 


( 12 ) 
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7 is positive definite, we can rewrite as: 
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nal estimators for the error criteria defined in section 3, can be easily 
this result: 

se of the segmentation problem, we put 
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lis estimate the "Maximizer of the Posterior Marginals" Cf m pm)' 
econstruction problem, we set: 
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reQi reQi 
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timal estimate is: 
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for all q (14) 

lis estimate the "Thresholded Posterior Mean" Qtpm )• 

t these results still hold if the sets Qi of allowable values for each 
le individual cost criteria C* are not the same for all i. In particular, 
ie that the index i varies over the union of two lattices: 

* € £1U L* 
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:ld at the sites of L\ represent the height of a piecewise smooth surface, 
es of La, take an integer value to indicate the presence (and possibly 
I of a boundary between two adjacent continuous patches (see Geman 
1984; we will explain this construction in detail in chapter 5). If we 

mixed error functional: 
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The m 
formidable 
and the met 
size. In the 
permit us t 




ieL x 

ieLz 


in obstacle for the practical application of these results, lies in the 
omputational cost associated with the exact computation of the marginals 
n of the posterior distribution given by (5), even for lattices of moderate 
next section we will present a general distributed procedure that will 
approximate these quantities as precisely as we may want 


5. Algorithms 


The al] 
Gibbs Sam 
behavior of 
chain gene 
posterior d 
example, l 
chain will 
gets large, 
by: 


orithms that we will propose are based on the use of the Metropolis or 
>ler schemes that we presented in chapter 2, to simulate the equilibrium 
the coupled MRF described by equation (5). We recall that the Markov 
ated by these algorithms is regular, and their invariant measure is the 
stribution P fl} . The law of large numbers for regular chains (see, for 
emeny and Snell, 1960) establishes that the fraction of time that the 
pend on a given state / will tend to P Ag {f; g) as the number of steps 
ndependently of the initial state. This means that we can approximate f 


£ f< ] 

n — k t==k 


(15) 


and the posterior marginals by: 


p i(l I ?) » ^ £ *(/?* - «) 


(16) 
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where fW i 
and k is the 


the configuration generated by the Metropolis algorithm at time t, 
inie required for the system to be in thermal equilibrium. From these 


values, f MI > 


and f TPM can be easily computed using (13) and (14). 


This pr )cedure is related to the use of simulated annealing (see section 4 of 
chapter 2) i or finding tine global minimum of U P (i.e., the MAP estimate: see 
Geman and Geman, 1984). In our case, however, we are interested in gathering 
statistics abc ut the equilibrium behavior of the coupled field at a fixed temperature 


T = 1, rath :r than in finding the ground state of the system. This fact gives our 
procedure s< me distinct advantages: 


1. It is difficult to determine in general the descent rate of the temperature 
(anneal ng schedule) that will guarantee the convergence of the annealing 


process 
Since v 
issue b< 

2. Sine 
the vah 



in a reasonable time (it usually involves a trial and error procedure), 
e are running the Metropolis algorithm at a fixed temperature, this 
comes irrelevant 

: in our case we are using a Monte Carlo procedure to approximate 
les of some integrals, we should expect a nice convergence behavior, in 
;e that coarse approximations can be computed very rapidly, and then 
to an arbitrary precision (in fact, it can be proved (see Feller, 1950) 
: expected value of the squared error of the estimates (15) and (16) is 


inversely proportional to n). 


The m; in disadvantage of this procedure is that in the case of the segmentation 
problem, a large amount of memory might be required if the number of classes 
per elemen m is large (we need to store the N(m — 1) numbers that define the 
posterior m irginals). 


With r aspect to the relative performance, we point out that in many cases, 
particularly for high signal to noise ratios, the MAP estimate is usually close to the 
optimal on*. If the noise level is high, however, the difference in the performances 
of the two < stimators may be dramatic. This is illustrated in the example portrayed 
in figure 6: panel (c) represents the MAP estimate of the binary MRF (a) from the 
noisy obser /ations (b); it is clear that the approximations to the MPM estimates 
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els (d) and (e) are better than the MAP from almost any viewpoint. An 
anation for this behavior comes from the fact that the MAP estimator 
linimizing the expected value of a cost functional Cmap{L f) which is 
only if fi = ft for all i, and is equal to, say, M otherwise, if the signal 
is sufficiently high, the expected value of the optimal segmentation 
very close to zero, so that Jmpm and /map coincide. In a high 
n, however, the MAP estimator will tend to be too conservative, since 
point it is equally costly to make one or one thousand mistakes. The 
:or, in contrast, can make a better (although more risky) guess, since 
/ mistakes has only a marginal effect on the expected cost. We will 
5 example, and analyze in detail the relative performance of both 
he next chapter. 


6. Computational Complexity and Parallel Implementations. 


We hav< seen how the optimal solutions of reconstruction problems , for a 
large class of cost criteria, can be obtained from the observation of the evolution 
of the Markc v chain generated by the algorithms presented in chapter 2. In this 
section, we w ill discuss die following questions: 

(i) Which o 'these algorithms is the best one to use on a serial machine, from 
the view )oint of the computational efficiency. 

(ii) Which o le is best suited for an implementation in parallel hardware. 

We will dso describe a parallel machine that is currently under construction at 
Thinking Ma :hines Corporation and at the MIT Artificial Intelligence Laboratory: 
the ”Connec ion Machine” (Hillis, 1985), and present estimates for the execution 
time of these algorithms in that particular piece of hardware. 

6.1. Serial C( mplexity. 


Suppose 


we are running our algorithms on a serial machine. In the three cases 


(Metropolis, Heat Bath and Gibbs Sampler), we first have to select the next site 
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Figure 6. la) Sample function of a binary MRF. (b) Output of a binary symmetric channel 
(error rate: 0.41 (c) MAP estimate, (d) Monte Carlo approximation to the MPM estimate, (e) 
Deterministic a iproximation to the MPM estimate. 

whose state 1 as to be updated. Assume it is site i. Let &U q denote the increment 
in the poster or energy associated with replacing the value of the state of the I th 
element by tl e value q. Using (6) and the expression for U 0 of (4), we get: 

= k £ OW/W) - v c (f)) + ft) - *,(/, ft) (15) 
*0 c.iec 

where 



Let C(AU) dinote the computational cost of evaluating (15). 
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-ssary steps for updating the state of site i are, in the Metropolis scheme 
.1 of chapter 2): 

le candidate state q from the set Q t (generate a uniform pseudo¬ 
number in the range (0, |Q t |], with cost C(prn), and load q from 
with cost C(load)). 

e AU q . 

r AU q > 0 (cost: C(comp)). If not, set U = q. Otherwise, go to 
e exp[-AUq] (cost: C(exp)). 

e a new uniform pseudo-random number in the range (0,1). 
e it with exp[— AU q ], 

re, we have that the total updating cost for the Metropolis scheme, C M , 

Cm > C{AU ) + C(prn) + C(comp) 4- C(load) 

Cm < C(AU) 4- 2C(prn) 4- C(exp) 4- 2 C(comp) 4- C(load) (17) 

Heat Bath scheme, steps (i), (ii) and (iv) are identical, and step (iii) is 
remaining steps are in this case: 

erate a new uniform pseudo-random number r in the range (0,1 + 

U q }} 

> i, set fi = q\ otherwise, leave /, unchanged, 
iating cost for the Heat Bath scheme, C H b is then: 

Chb — C(AU) + 2C(prn) + C{exp) + 

C(comp) + C(add) 4- C(load) (18) 

general, it will be higher than Cm, since 

C(exp) > > C(comp) 

Gibbs Sampler, we select the new state by generating a pseudo-random 
ich takes values on with probabilities given by the conditional 
(equation (1) of chapter 2). To do this efficiently, we rewrite this 
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equation as: 


(Note that A 
Let Qi = 


The new stat 
r in the rang 


The compu 
Ccs - 


note that we 
array a. 

If N is 
estimate, the 


where C(seh 
state is goin$ 
pseudo-rand 
requires onl> 
using a deter 



aj = cLj—i + exp[— AU q .] , j = 1 ,..M 

/ t is now computed by generating a uniform pseudo-random number 
(0, a M ], and putting: 

7» = <?j ; r6(ay_i,ay] 


itional cost will be: 

[M — 1)[C(AU) + C(exp) -f 2 C(add) + 4 C(load) + C{comp)\ -f 

+C(prn) (19) 

are including the overhead cost incurred by the use of the auxiliary 


he size of the lattice, and we perform n iterations to compute our 
total cost will be: 

Cx — N • n • ( C(update ) -f C(select) + C(overhead)) (20) 

zt) is the cost associated with the selection of the next site whose 
to be updated. This selection involves the generation of 2 uniform 
m numbers in the first two cases, whereas for the Gibbs sampler it 
a couple of additions, since in this case we can select the next site 
ninistic rule, such as lexicographic order (see section 6.3 below). 
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Since C{ 
Metropolis a 
as the size o 
needed to ge 
faster in the 
latter case w 
sampling of i 

A rigoro 
on the natun 
needed to cl 
parallel impl 
We will justi 

6.2. Parallel 

A neces: 
Markov chai 
to the Gibbs 
updated at t: 
also sufficien 
update simul 
algorithm in 
The total exe 


where K is 
neighborhood 
to color the i 
Note that if 
may use a nt 


update) is the dominant cost, apparently one should conclude that the 
gorithm is the most efficient. It must be considered, however, that 
‘the state space (i.e., M = |Q t |) increases, the number of iterations 
an estimate with an equivalent degree of precision will increase much 
Metropolis or Heat Bath cases, than in the Gibbs sampler, since in the 
: are using an "importance sampling" procedure, versus the uniform 
le former (see Hammersley and Handscomb, 1965). 

is analysis of the tradeoffs involved is not easy, and is highly dependent 
of the particular problem, so that an experimental analysis might be 
irify these questions in each case. In the more interesting case of a 
:mentation, however, the Gibbs sampler becomes the obvious choice, 
y this assertion in the following sections. 

Jpdating. 

ary condition for the convergence of the probability measures of the 
ts defined by the Metropolis, Heat Bath or Gibbs Sampler algorithms 
measure is that if two sites belong to the same clique, they are never 
e same time. As we will show in the next section, this condition is 
only for the case of the Gibbs sampler. In this case it is possible to 
aneously the states of all non-neighboring sites, by implementing the 
a parallel architecture in which a processor is assigned to each site. 
:ution time will then be reduced by a factor of 

N 

K 

the so called "chromatic number" of the graph that describes the 
structure, and it is equal to the minimum number of colors needed 
ites of the lattice in such a way that no two neighbors are the same), 
he state of every site is allowed to take real (continuous) values, we 
merical simulation of the stochastic differential equation: 

df = —grad U(f)dt + yfTTdw 
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to generate 
chapter 2). I 
implementa 

6.2.1. Conve 

Geman 
defined by tl 
the initial st< 
only that we 
The converg 
result for wt 

First, w 
i, every valu 


Since b; 
two states of 
finite numbe 

On the < 
of the chain, 
possible coni 
measure: 

After the up< 
Pr( 


sample configurations from the desired distribution (see section 3.3 of 
n this case, all sites can be updated at the same time, so that a parallel 
ion can reduce the complexity by a factor of N. 

gence of the Gibbs Sampler. 

and Geman (1984) established that the measure of the Markov chain 
le Gibbs sampler will converge to the Gibbs measure independently of 
te, independently of the order in which the sites are updated (provided 
keep visiting every site, i.e., that we update its state infinitely often), 
mce of the parallel implementation, therefore, follows from this general 
ich we present here a simple alternative proof: 

5 note that from the definition of a MRF, it follows that for every site 
j q e Qi , and every configuration /, the conditional probability, 

Pr (/.- = Q I fj , j 7^i)> 0 

hypothesis every site is visited infinitely often, this implies that any 
the chain will be mutually accessible (with positive probability) in a 
' of steps, which means that the Gibbs sampler defines a regular chain. 

)ther hand, the Gibbs measure i r(/) is an invariant probability vector 
To see this, suppose that at time £, just before updating site i, the 
igurations of the field F{t) are distributed according with the Gibbs 

PT(F(t) = /) = t r(/) 

late we have: 

F(t + 1) = /) = Pr (Fi(t + 1) = /,• | F 3 {t) = f i , j ^ i ). 

•Pr (*!•(*)=■/y . JV«1 = 
















j jL i) = 7r(/) 


because, by 
selected ran 
now comph 
(see Kemen 

(i) Has a i 

(ii) The me 
measur 

Note tli 
the Gibbs sa 
on the satis 
chapter 2), ; 
order. We \ 
parallel upd 
parallel imp 

6.2.2. Breaki 

To shov 
updating set 
with Ising p< 


To imp 
two non-ovi 
"white” site; 

Let fw, 
that / = {/, 


— ^{fi | fj ) J 7^ z ) * *(fj j 

the definition of the algorithm, the new state of the i th element is 
iomly according with the conditional Gibbs distribution. The proof is 
ted by remembering a well known theorem for finite Markov chains 

/ and Snell, 1960) that establishes that every regular Markov chain: 
nique invariant probability measure. 

isure of the chain will converge (with probability 1) to this invariant 
; independently of the initial probability distribution of the states. 

at, unlike the Metropolis and Heat Bath algorithms, the convergence of 
mpler does not depend on the reversibility of the chain (or equivalently, 
"action of the "detailed balance" condition given by equation (3) of 
Llthough this condition will hold if we use it with a random updating 
ill now see that the reversibility will not hold in general if we use a 
iting scheme, which will make the first two algorithms unsuitable for 
ementations. 

lown of Reversibility for Parallel Updating. 

why this condition is violated (by the three algorithms) when a parallel 
erne is used, we will consider a first order, binary MRF on a lattice L 
>tentials, that is, 

/» £ {0, 1} for all i e L 
r-1, if \i -j\ = 1 and /,• = fj 

Vcifi, fj) = 11, if \i - j\ = 1 and /,• ^ fj 

lo, otherwise 

iment a parallel updating scheme, we divide the sites of the lattice into 
rlapping sets, which we will call B and W (the sets of "black" and 
respectively) as illutrated in figure 7. 

r j3 denote the state of the elements belonging to W and B, respectively,so 
r ,fn}‘ The parallel updating scheme consists in updating first, say, all 
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Figure ’ 

the white sii 
with any tv\ 
of the elem 
is updated i 

Let P M 
of all the \n 
transition p 
they are cle; 

1 

and similarl 

where n is ti 

Now, 1( 
"white-blacl 

Pu 


• O • O • 

o • o • o 

• o # o • 

o • o • o 

• O • O • 

Non-overlapping sets for parallel updating (sec text) 

es, and then all the black ones. Note that the random variables associated 
□ sites of the same color are conditionally independent (given the state 
:nts of the other color), which means that the order in which their state 
»immaterial, so that, in fact, they can be updated simultaneously. 

,Pb , denote the transition probabilities corresponding to an update 
tiite and black sites, respectively. Note that both Markov chains with 
obabilities Pw and Pb satisfy the detailed balance condition (although 
rly not regular), so that for a fixed / fl , we have: 

A 

W[{fw, Zb}, {fw> Zb}) = Zb}, {Zw, Zb}) 

r , for a fixed Zw . 

*b{{Zw, Zb}, {Zw, Zb}) = p b({Zw, Zb}, {ZwZb}) 

ie Gibbs measure of the complete configuration / = {Zw,Zb}- 

t Pwb(Z,Z) be the transition probability associated with a complete 
M update (where the white elements are updated first). We have: 

b(Z,Z) = Pw{{Zw,Z b}, {Zw,Zb})Pb{{Zw,Zb},{Zw,Zb}) = 
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where Pbw 
sites visited 

Now, c 


and let 


Clearly, 


and so, 


= Pw({fw> fit}* {fw, fit}) 


7T Cfw>M 


•Pdlfw, fn}* Ow* fn}) 


dfwtfo) 

*Cfw> fo) _ 
dfw*fB)‘ 


= f) 

is the transition probability of the converse "black-white" update (black 
first). 

insider die particular configuration: 


‘-C 


iew 

ieB 


fi = 1 for all i e L 


PbwOJ) > Pwn(f*f) 


Af)PwB(fJ) > df)PwB{fJ) 


so that the detailed balance condition does not hold. 


The at 
prescribed i 
by any of th 
will remain 
the configui 
distribution 
of this invai 


the Gibbs 


Cr! 


)ve argument can be easily generalized to show that if we use any 
Ddating order (such as lexicographic order), the Markov chain generated 
j three algorithms will also become irreversible. These chains, however, 
regular, which means that in each case, the probability distribution of 
itions generated by the chain will converge towards a unique invariant 
In general, however, it will not be possible to guarantee the coincidence 
ant measure with the desired Gibbs distribution, except in the case of 
mpler. 


An example of a situation in which the invariant distribution is not the 
Gibbsian measure, can be obtained by running the Metropolis algorithm, either 
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with lexico 
section 2 o 
of the infin 
are almost 
predictions 
order is us 
Bath schemi 


'aphic or "black-white" updating order for the Ising model discussed in 
chapter 2. If the natural temperature is below the critical temperature 
te lattice, the algorithm will produce equilibrium configurations that 
:ompletely uniform, and therefore, inconsistent with the theoretical 
and with the behavior of the same algorithm when random updating 
j). The Gibbs Sampler (which in this case is equivalent to the Heat 
), on the other hand, produces consistent results, as expected. 


6.3. Discusslm. 


The pri 
time) for the 
our general 


where n is t 
graph of the 
Sampler, giv 

An exa 
Machine" (1 
processing ol 
It is a "Sins 
256,000 proc 
4K bits of ; 
square. Besic 
computation; 
a "Cross Om 

At each 
one microse< 


vious results mean that the expected computational cost (execution 
solution of a reconstruction problem on a large parallel machine, using 
tfonte Carlo procedure, will be given by: 


Cp = n • K • Cqs 


( 21 ) 


bit is trans 


ii 


be impleme 


le number of (global) iterations; K is the chromatic number of the 
underlying Markov model, and C GS is the updating cost of the Gibbs 
in by equation (19). 

nple of such a massively parallel architecture is the "Connection 
lillis, 1985). This machine was originally designed for the parallel 
structured symbolic expressions, such as frames and semantic networks, 
le Instruction Multiple Data" (SIMD) array processor consisting of 
issing units (each with a single bit Arithmetic/Logical unit, and about 
torage) organized in a four-connected lattice that is 512 elements 
es this nearest-neighbor connectivity, it will also be possible (although 
lly more expensive), to connect any two processors in the array using 
iga" router network (Knight, in Winston, 1984). 

cycle of the machine, for which we will assume a duration of 
ond, an instruction is executed by each processor, and a single 
itted to its neighbors. This means that the updating scheme can 
ted most efficiently if the field is first order Markov, but higher 
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with the ord 
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finding the < 
(i.e., the seg 
will analyze 
estimator is 
the use of af 
integers. We 
cycles of a sii 
or comparisc 
generating a 
16 cycles foi 
exponential. 

Assumir 
time we get, 


Althoug 
this approacl 
more conven 
continuous v; 
smoothness c 


where w is a 
section 2.2). 

This sch 
discrete (e.g., 


ses can also be implemented without using the router by successively 
the transmitcd state (the execution time, therefore, will grow linearly 
^r of the field). 

? these results more concrete, consider, as an example, the problem of 
>ptimal estimate for an M-ary, first order MRF with lsing potentials 
nentation of a piecewise constant image) from noisy observations (we 
this problem in detail in the next chapter). Let us assume that the 
o be implemented in the "Connection Machine", and suppose that by 
propriate scaling factors, all the numbers can be represented as 16-bit 
will use the following conservative assumptions: We assume that 16 
igle 1-bit processor are needed to perform 16-bit addition, substraction 
n; 16 2 cycles to perform multiplication or division; 2 x 16 2 cycles for 
pseudo-random number with uniform distribution on a given interval; 
memory transfer operations, and 6 x 16 2 cycles for computing an 

g that we run 250 iterations of the system, and ignoring the overhead 
from (19) and (21), 

C P 1.4 (M - 1) seconds (22) 

i this execution time may be reasonable in many cases, it is clear that 
becomes impractical as M becomes large. In this case, it might be 
ent to approximate the field by one in which the state at each site takes 
ilues in a compact set and, provided that U P satisfies the appropriate 
mditions, use the stochastic differential equation: 

df — -grad Up dt + \/2Tdw (23) 

Wiener process, to simulate the behavior of the system (see chapter 2, 

me will not work, however, if some of the variables are intrinsically 
binary variables indicating the presence or absence of a boundary). In 
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this case, it 
discrete var 
ones using < 
determined, 

These c 


ways of solv 
following di 


might still be possible to use a mixed scheme in which the state of the 
ables is updated using the Gibbs Sampler, and that of the continuous 
quation (23), but the precise form of such mixed schemes has not been 
nor their convergence properties established. 


3nsiderations provide us with a strong motivation for finding alternative 

ing these problems. In particular, much more research is needed in the 
ections: 

(i) Design )f more efficient (possibly deterministic) algorithms for approximat¬ 
ing the 3ptimal estimators for particular classes of nmhipmc 


(ii) Design 
algorith 


. . ~ /-' • *'•* ***»-» »V/I uppiUAllIiat 

iptimal estimators for particular classes of problems. 

if analog and hybrid networks for implementing these kinds of 


We will 
in the follow 


study these possibilities in detail, in the context of specific problems 
ng chapters. 
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Chapter 4 


REC 


NSTRUCTION OF PIECEWISE CONSTANT FUNCTIONS 


I. Introduction. 


In this 
developed, t 
observations 

(i) Binary 
useful i 
and ma 

(ii) Several 
(Elliot, i 
or the 
(1976)), 
constam 


chapter we will apply the optimal Bayesian estimators that we have 
3 the problem of reconstructing piecewise constant functions from noisy 
The efficient solution of this problem is relevant for several reasons! 

mages (or images consisting of only a few grey levels) are directly 
i many interesting applications (for example, object recognition 
lipulation in restricted (industrial) environments). 

perceptual problems, such as the segmentation of textured images 
t. al. (1983); Hansen and Elliot (1982); Cohen and Cooper (1984)), 
ormation of perceptual clusters (O'Callahan (1974); Marroquin 

can be reduced to the problem of reconstructing a piecewise 
surface. 


be adeq 
processe 
way. We 


(iii) As we ' ill see in the next chapter, where we treat the reconstruction of 
piecewij e smooth surfaces, the boundaries between continuous patches can 
be adeq .lately modeled by binary fields coupled with continuous valued 
5. These coupled systems are very difficult to analyze in a rigorous 
hope to increase our understanding of them by studying first the 


estimation of binary fields. 


2. Problem Formulation. 


Followi 
constant fun 
Ising potenti 


g Geman and Geman (1984), we will model the behavior of piecewise 
:ions using first order MRF models on a finite lattice with generalized 
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V f ;(/.,/y) 


if |> - j\ = x and /,• = / y 
if |t-i| = l and/.-^/y 
otherwise 


We wil 
site will be: 
at a corner. 


/»• € 6; = (<?i,.. •, qm} for all i 

use a free boundary model, so that the neighborhood size for a given 
> if it is in the interior of the lattice; 3, if it lies at a boundary, but not 
md 2 for the corners. 


The Gifcbs distribution: 


defines a or 
constant pati 

Using tl 
section 2.1 o 
that chapter: 


with 


Of particular 
taken as the c 
1975), so thai 


p /(f) — \ exp(-i-C/ 0 (/)] 
Oo (/) = znfi.fi) 


3 parameter family of models (indexed by r 0 ) describing piecewise 
srns with varying degrees of granularity. 

e general stochastic model for the observation process presented in 
* chapter 3, we get the posterior distribution given by equation (6) of 


p f\ g if>9) = ^exp[-C/ F (/;g)] 


U P (f ; g) 


u o (/}+ J2 

ies 


interest will be the case of binary fields (M — 2) with the observations 
utput of a binary symmetric channel (BSC) with error rate e (Gallager, 




n<>,i/<>-{““ 4 

U, for ^ Si 

In this case, t te posterior energy reduces to: 


u p(f\g) = =- £ V(fi, fj) + a £(1 - S(fi - Si)) 
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where /,• e 


and 


Note that in 
modifying 
variables {/J 


> 1 , 92 }; 


trie 


which corre 
spatially var 
The imports 


fl. 

if a — 0 


0 , 

otherwise 

(5) 

a = In j 

fl - e\ 


, . ) 

(6) 


this case (and also in the case of additive white Gaussian noise), by 
constant Zp, and applying a suitable linear transformation to the 
}, so that Qi = {- 1 , 1 }, we can write the posterior energy in the form: 


Ur(f ;g) — /«/y + fi9i 

0 » 


(7) 


ponds to the Hamiltonian of an Ising ferromagnet coupled with a 
'ing external magnetic field (whose magnitude is proportional to g). 
ice of this connection is twofold: on the one hand, it means that the 
tools develo|ed for the equilibrium behavior of these systems — which is what 
the estimatic n process is about — may be relevant for the physicists. On the other 
hand, it is co iceivable that one could use physical ferromagnets to construct special 

purpose "qi intum" computers that could solve estimation problems at atomic 
speeds. 

In the f blowing sections, we will study the relative performance of different 
Bayesian esti jnators, and design efficient algorithms for approximating them in some 
important particular cases. 


3. Relative plrformance of Bayesian Estimators for Binary Fields. 


Once the! 
problem by 
in chapter 3, 
To understand 
the estimation! 
error rate e. 


posterior energy has been determined, one can solve the reconstruction 
f nding the optimal Bayesian estimate of the field /. As we discussed 
lowever, we have several possible choices for the optimality criterion, 
the differences in their performance, we will now analyze in detail 
of binary fields, when the observations are the output of a BSC with 
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F| g ure 8 - R atio of the average errors of the MAP and MPM estimators for a 2 x 2 Ising net 

of the lattice) panel (b), the output of a binary symmetric channel with error 
rate e = 0.4, panel (c) the MAP estimate, and panel (d) an approximation to the 


MPM estimate (which we will label "MPM (M.C.)") obtained using the Metropolis 
algorithm and equation (10) to estimate the posterior density. The corresponding 
values of the josterior energy Up (equation (13)) and the relative segmentation 


error (e,/64 2 ) fre shown on table 1. 
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Energy 
Seg. Error 


A A A 

/map !m 


-5594.8 


-226.0 


-6660.9 

0.33 


-6460.0 

0.128 


-6427.0 

0.124 


4. Exact Algorithms for the MAP Estimator. 


From tli 
noise ratio i< 
one can dcsi 
the case of o 
which compt 
is O(N) (the 
needed, and 
distributed in 
lattice length. 


e discussion of the previous section, it is clear that if the signal to 
not too low, the MAP criterion may be an appropriate choice, if 
?n efficient algorithms for computing it. As we will now show, in 
le-dimensional binary fields, one can in fact construct an algorithm 
tes (exactly) the MAP estimate with computational complexity which 
length of the lattice) in a serial machine: at most 22 N operations are 
the storage requirements are also O(N). The algorithm can also be 
a parallel architecture, making its execution time independent of the 


To simplify the notation, we will assume that /,• £ {-1,1} for all i (there is no 
loss of generality in this asumption, since any binary process can be brought into 
this form by | reversible linear transformation). Also, assuming the noise process is 


stationary, w 


where To is 
From eq 


introduce the notation: 




natural temperature of the field. 


ations (1) and (3), it is clear that the MAP estimation problem is 


equivalent to (he minimization of: 


U F (f) = n + ^/,(?»■) 
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where n is tie number of places where / t - 7 ^ /,•+, (the number of odd bonds of the 


configuratio 
be reduced I 
for the odd 
blocks). We 


1 ). From this expression, it follows that the MAP estimation process can 
3 the problem of finding the optimal value for n, and the best locations 
)onds ( which we will also call "boundaries" between constant-valued 
will now present a procedure for performing this task. 


Description pf the Algorithm. 


The iden in which this method is based is the following: 


We stai 
estimate k e 
is initially se 

Whenev 
by putting a 

[M, that is 


where 


t scanning the sequence {&}, say, from the left, with some initial 
{—1,1} for the value of / in the block that starts at Z 0 (a pointer that 
: to 1 ). 

er we process a new observation gj, we ask if we can get a lower energy 
boundary in j and in the best possible location Z within the interval 
we ask if: 

Uf, 4 - 1 < Up 


Up = E *+*(») 

i=l 0 

u b = 1 + £ *+*(») + E *_»(«) 

*— lo *— /+1 

As we will s( e below, the optimal boundary location l (which is initially set equal 
to Z 0 ) needs 13 be updated only if the conditions: 


Sf L ^ k 


rML / rML 
Jj 

l 

V-k(9i) < X) “ *-k(9i) 

t=/o 


hold simulta 
maximum li 


E *+*(».•) 




eously, in which case Z is set equal to j - 1 . Here, f ML denotes the 
lihood estimate; since we are using a white noise model, it is given 
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by: 

If we g 
update the 
estimate foi 

Otherv 

When 
process bac 
to make thi 
that we car 
parallel ove 
subsequeno 
in /. The fii 

Formal 

Definition o 

i: Current p 
/ 0 : Pointer t 
l: Current o 
k: Current e 
U p \ Energy 
U m : Energy 
U b : Energy i 
si: Best loca 
siml: Best 1 


if *+i(sy) < «--i(9i) 

3 l—1, otherwise 

it a lower energy by putting a boundary at Z, we set /,• = k for i e [Z 0 , Z]; 
value of the pointer Z 0 by setting it equal to Z + 1, and set the new 
the value of /, in the block that starts at Z 0 , equal to -k. 

ise, we just set f 3 - = /c, and continue to process the next observation. 

we reach g yy, we take as the initial estimate and run the same 
:wards to get the final solution (in fact, one can show that it is possible 
5 backward run as soon as we get the second boundary). This means 
implement the algorithm in a distributed fashion, by processing in 
laping subsequences of {&}, provided that the length of each of these 
:s is greater than twice the length of the largest constant-valued block 
al solution is then obtained by pasting together these partial estimates. 

y, the algorithm is as follows: 

‘ Variables. 

osition. 

) the beginning of the current region. 

)timal location of the boundary in the interval 
stimate for f([l 0 , Z]). 

ncrement associated with the assignment f{[lo,i]) = k. 
increment associated with the assignment f([lo,i]) — -k. 
icrement associated with the assignment /([Ml) = k; f{{l,i]) — -k. 
(maximum likelihood) estimate for /,-. 

>cal (maximum likelihood) estimate for 
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U pl : Energy 
U rnl : Energi 
Utcmp> Tern | 
M: A very I 
#o: Switch 

Algo 

1: Ini 
Set 
Set 

Set 
2: Mi 
Beg 
S 

2 . 


increment associated with the assignment f{[k>l}) = k . 
increment associated with the assignment /([/ 0 , /]) = -A;. 
>orary storage register, 
arge positive number, 
ndicating the method for estimating f\. 

rithm A1(K 0 ): 

tialization. 

= ^ = 1> Up — U n = U ml = 0; Ub = l‘ } Upi = Af. 

fc = 1 , if #0 = 0 and $ + 1 ( 01 ) < _i(gi) ; 

- 1 , if # 0 = o and ^ +i(sn) > # _i(gi) ; 

# 0 , if #0 7 ^ 0 . 
siml — k 

in Loop: For i from 1 to N do: 
n 

;t Si = l, if # +i(g { ) < V _i( s< ) ; 

—1, otherwise. 

1: See if the optimal boundary location needs to be updated: 

If (si k and si 7 ^ szml and U p - U p i — U m + U m i < 0 ) do 

Update boundary location: 

Set: 

l = i -1 
Upl = Up 
Uml == U m 

u h = Up + 1 
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\2: Update energy increments: 

Set: 

Up = Up + # + *fe) 

Um == C/ m ^ — k(fft') 

u b = u b +9 _*(*.) 

1.3: See if a new boundary has to be introduced: 
If (C/ fc + i < C/ p ) do : 

Introduce a new boundary: 

For j from l 0 to l do : Set fj = A: 

Set: 


End 


3: Se 


k = —k 
Iq = I -4- 1 

Uttmp === C/p C/p/ 
u p = U m - Una 

Urn == Utemp 
C/p/ = M 

£4 = ^m + 1 


E.4: Set stml — 


if the last boundary has to be introduced: 


If (14 < %) do : 


.1: For j from l 0 to l set fj — k. 
2: Set l 0 = l + i. 


4: Fill 


For 


End. 


.3: Set k — -k. 
the last region: 


j from l Q to N set fj = k. 
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The proof 
(7) is prese 


of the fact that this algorithm will in fact find the global minimizer of 
i ted in appendix 4.A. 


In app 
is based on 
the one we 
of being ex 
compute thl 
relationship 
filters to pn 


•< ndix 4.B we present an alternative approach to this minimization, which 
lynamic programming ideas. The resulting algorithm is less efficient than 
lave just presented for the case of binary fields, but it has the advantage 
:ensible to handle more general situations. Also in this appendix, we 
probability distribution for the number of odd bonds, and discuss the 
between the dynamic programming procedure, and the use of linear 
iduce multi-scale descriptions of piecewise constant signals. 


5. Estimation of Two-Dimensional Binary Fields. 


The te<jhiques developed in the last section for the exact computation of the 
MAP estim 
here is that 
one dimens 


te cannot be extended to the two-dimensional case; the main difficulty 
be geometry of the boundaries between uniform regions (which in the 
onal case are simply points), causes a combinatorial explosion of the 
number of possible configurations compatible with a given total boundary length. 


The questio| 
the optimal 
efficient thali 


5.1. MAP Estimator. 


fc* 


In the 
algorithm 
of sites (in 
Wilson (197f 
of critical p 
each of thes 
blocks in su< 
entropy assu 


p, then, is whether it is possible to find algorithms that approximate 
estimates (with respect to the selected error criterion), that are more 
the general Monte Carlo procedures presented in chapter 3. 


|ase of the MAP estimator, the efficiency of the Simulated Annealing 
r the minimization of Up can be improved by defining large "blocks" 
||a manner that is reminiscent of the "block-spin" strategy used by 
) in connection with the renormalization group approach to the study 
enomena); the optimal estimate for the average value of the field in 
5 blocks is found, and then progressively refined by subdividing the 
cessive annealing stages. We will now show that, if we use a maximum 
nption, the structure of the MAP estimation process for Ising models 
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is invariant 
(i.e., the M 


under the "blocking" transformation; this means that the ground state 
AP estimator) of the aggregated process (with blocks of size L) also 
correspond* to that of an Ising model with a coupled external field, in which the 
natural tern >erature is scaled by a factor of 1/L, and the noise (coupling) parameter 
by a factor of L 2 . As a consequence of this scaling, the final temperature for the 
simulated a mealing of this smaller network will be approximately L times larger 
than for the original problem. 


Let us consider a binary Ising net / with the observations taken as the output 
symmetric channel with error rate e. From section 2, we know that the 
irgy will be: 


of a binary 
posterior eni 


with 


and 


Up = //) + <*£ 9(/i, 9i) 

0 t,J i 


( 8 ) 


q(f> 




if 9i = f% 
if 9i 7^ fi 




Notice that Equation (8) can also be written in the form: 


Up = ~ £ Vciiu fi) + a £ q c (h, 9i) 


where Vc,q\ are continuous functions satisfying: 

Vc{x, y) = V(x, y) and 


We will novl 
partition the 


(S’) 


qc{x, y) = q(x , y) for x,y e {0,1} 

derive an expression for the energy in the "block spin" case. Let us 
loriginal lattice L into square blocks of side L. The "block observations" 


g L will now be the density of l’s on each block, i.e., 


9L(t) = 72 E 9j 

L jEBi 
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where Z? t is 

For a j 
con figu ratio 
are random! 

1. Interactio 

The int< 

where P kl is 
to an elemer 


Substituting 

2. Interactioi 

This ter 
11,10,01 and 
are 2 L[L - 1] 


the i th block. The "block field" fi is defined in a similar way. 

;iven //,, we compute the energy by assuming a maximum entropy 
i, which occurs when the l’s that correspond to the given density f L (i) 
i distributed within the block. The energy will have three terms: 

is between adjacent blocks: 

faction between two adjacent blocks i and j will be: 

Uj — [—1 • (Pn + Poo) + 1 • (Pio + Poi)] • L 

the probability of having an element with state k on block % adjacent 
t with state l on block j: 

Pll = fi(i)h(j) 

Poi = hu)( i - m) 

Pio = f L {i)( 1 - f L (j)) 

Poo = (1 - /l(0)( 1 - h(j)) 

liese values we get: 

Uj = m/ L (i) + /,(;)) - 4 f L (i)hU) - 1 ] 
s within each block: 

n depends on the relative frequencies of the clique configurations 
00 (ph,pio,poi and poo, respectively) on each block (note that there 
different cliques). Since the l’s are randomly distributed we get: 

pn = m 2 

Pio = Poi = /l(0(1 - /l(0) 

Poo = (1 - fiA*)) 2 


76 









so that the 




ternal interaction /, is: 


3. Interactic 

Assumi 
distributed 1 

Finally. 


note that the 
respectively. 

For L > 1, t 

and since 


Ii = 2 UL - l)(-4 f L (i) 2 + 4 hXt) - 1) 
n with the observations: 

ig that the l’s in the observations and in the field are independently 
ve get: 

hba{i) = aL 2 [f L (i)( 1 - g L (i)) + (1 - /l(0)^(*)1 = 

= «£ 2 [/ l (*) + 9lXi) - 2fiXi)gi[i)} 
the energy takes the form: 

ulu ,,)=~ e hi ++wo)= 

io ij i 1 0 

= + A(i)) - 4/iM/z.W -1] + 

J ° «-,y 

+ ^(i - 1) E(-4 /lW 2 + 4//,(t) -1) + 

+ 0l(O - 2/l(i)(7l(0} 

t 

sums are taken over pairs of adjacent blocks, and over all the blocks, 
For L — 1 , this expression reduces to (8’) with 

Vc(fi, fj) = 2(/,- + /y) - 4/i/y - 1 

9c( g j b) — a + b — 2a6 
ie quadratic terms of Ui are: 

“[-4 E AW AM - 8(Z - 1) E AW 2 ] 

10 i,j i 

-2 E AWAM + 2 E AM 2 = E(A(0 - AO)) 2 > o 
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it follows 

and 


which impli 
constrained 
lie in a corni 
to find the g 
the energy t 
constant): 

The minimi 
rep resen tatic 
refinement ( 
solution as a: 
(the MAP es 
At present, b 
determining 

Also in 
This author 
pragmatic cr 
algorithm, b; 
envelopes to 

The relai 
should be ass 


at 

EAW* > EfiWdi) 

-4 e mm - §(l - 1 ) e m 2 < 

< -(4 + 8 (L - 1)) £ f L (if < 0 

X 

* s that £//, is negative definite for L > 1, and therefore, its minima, 
o the hypercube [0,1]^ ( N L is the total number of blocks) will always 
t of such hypercube, which means that we can use simulated annealing 
obal minimum of U L , constraining the search to ( 0 , 1 } Nl . In this case, 
a be minimized takes the simpler equivalent form (up to an additive 

Ul = T^/L ^ «^y)) + aIj2 viMi), 9 l[})) 

m energy solutions for each L can be interpreted as ’’coarse scale” 
ns of the original pattern /. Once a solution is obtained, the next 
or blocks of size L/2) can be efficiently obtained using the previous 
tarting point, and initiating the annealing process at a lower temperature 
timates presented in this chapter were obtained using this technique), 
owever, we do not have a good method (other than trial and error) for 
Jie optimal values for these initial temperatures. 

his connection, the work of Blake (1983, 1985) should be mentioned, 
proposed the minimization of an energy function similar to Up as a 
terion for restoring piecewise constant images. He also proposed an 
ised on the successive approximation of U P by a family of convex 
find an approximation to the global minimizer. 

ive performance and computational efficiency of these various schemes 
essed experimentally. 
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5.2. MPM Estimator. 


In the c 
algorithm wlf 
error) is equ 
following iddas 


First, wq 
We will app 
Pa with the 


fse of the MPM estimate, it is possible to construct a fast deterministic 
ose experimental performance (in terms of the average segmentation 
((valent to the Monte Carlo method discussed above. It is based on the 


recall that for a binary pattern, the MPM and TPM estimates coincide, 
(oximate the posterior mean of (3) by that of a Gaussian distribution 


property: 


In partiq 


Pa(h) = i-e-'M'*) for all h e {0,1}. 

6p 


blar, we use: 


foW = b «p[-5t £ £ (*. - M 2 - « £(*,- - «)*]• 

, j 6N . i 


where 


Ni = {j€L : ||i-i|| = i}. 

For this distribution, h is the (unique) minimizer of the convex function: 

UgW = i- E £ (fc.- - hi? + « Uhi - m? 

i0 « jeNi i 

which corresponds to the unique fixed point of the system: 

(*+i) _ Sj -eNi tip + aTogi 


We coul| now approximate our estimate by putting: 

U = 0(Ki) 


where 


hi- ' = 


WI + aTo 


(9) 


0(x) 


=r* 

to, 


if*> } 

otherwise 


( 10 ) 


79 



There is an 
be shown 
the MPM e; 


tH 


also satisfies 


a new MPM 


additional consistency condition that f must satisfy, however. It can 
at when the posterior distribution has the form given by (3) and (4), 
intimate /, which by definition satisfies: 

p i\g(fi'> 9) > - /*); g) 


p i\ g (fi\ f) > ^((l - /,*); /) 


(ii) 


which mean; that if we replace the observations by the MPM estimate, and compute 


estimate for this modified problem, we should get the same result (the 
proof is included in appendix 4.C). Translating this condition to the case of/*, we 
get that it m|jst satisfy: 

f = e(^) 


( 12 ) 


h, 


where h* satisfies: 

Ey € /v,. h) + aToQ(h-) 

m + aTo 

In practice, \|e get h * as the fixed point of the system: 

*(*+1) £j ~eNi + ctTo&(hW) 

hi = -pV^Tb- 


with 


Note that thd 


acts as a Lyaf 
(Vidyasagar, 


This algtf 
we extract all 


(13) 


A*<°> = h 


function: 


u h (h) = E (ft.- - h if +«2o E(*i - ©(ft.)) 2 


ijENi 


unov function for the system (13), which is therefore (locally) stable 
1978). 


rithm can be visualized as operating in two steps: In the first one, 
the information that we need from the observations and encode it in 
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h (which is c 
pattern that 5 

To illus 
example disc 
segmentation 


ontinuous-valued), and in the second one, we find the closest binary 
atisfies the consistency condition (11). 

A * 

rate the performance of this approximation, we show f , for the 
assed above, in panel (e) of figure 1, and its corresponding energy and 
error in the last column of table 1 (labeled "MPM det"). 


5.2.1. Parallel Implementation. 


The dyr 
directly in a 
a processor 
Each update 
multiplicatio 
experimenta 
needed for < 
total execute 
magnitude a 

5.3. Analog I 

Hopfiel 
behavior of 
resistors, wh 


Here, N,* is 
voltage of ti 
i and j\ U 
the internal 


amical systems defined by equations (9) and (13) can be implemented 
parallel architecture, such as the "Connection Machine", by assigning 
to each site, and updating the state of all sites at the same time, 
will require, for both systems, at most 10 (16-bit) additions and two 
ns, that is, a total of 672 cycles of a 1-bit processor. We have found 
ly that in most cases, less than 50 iterations of (9), and 100 of (13) are 
:onvergence, so that, using the figures of chapter 3, we estimate the 
)n time as approximately 0.1 seconds, an improvement of one order of 
ver the general Monte Carlo procedure described in that chapter. 

Networks. 

i and Tank (1985) (see also Hopfield, 1982 and 1984) have studied the 
"neural" analog networks of non-linear amplifiers interconnected by 
3 se dynamics can be described by the differential equations: 


— Tijfj +I{ 
jeN< T 

ft = ©K) 


(14) 


the neighborhood of node i\ and /,• denote the input and output 
le i th amplifier; T tJ - is the conductance of the link between the nodes 
s a fixed current injected at node », and r, a constant depending on 
resistance and capacitance of each amplifier. The gain function of the 
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amplifiers, 0 
interval (0,1), 
(hence the te 


where f3 is ca 


These re 


we have T t 




i|earchers have proved that the system (14) is always stable, provided 
Tji for all i,j, and in the high gain limit (for /3 >> 1), the stable 
fixed points vfill be local minima of the "energy" function: 


Note thaf 


They ha r 
points of (18) 


with 


These equatif 
1965) to the 
the Gibbs md 
can be shown 


■) is chosen as a sigmoid function that restricts the output to the 
and has a form similar to the observed response of biological neurons 
m "neural"). In particular, one can put: 

1 


e(u) = 


1 + exp[— (3u] 


(15) 


led the "gain parameter". 


| we can write (14) as: 

du{ 

dt 


W—l'ZTiififi-'LfiU 

*,i * 


(16) 


dE 


Ui 


fi 


dfi 

0(u t ) 


(18) 


e also pointed out that if one uses the gain function (15), the fixed 
will satisfy: 

1 


fi = 


1 + exp [—/?r if,•(/)] 


(19) 


Hi(f) = 


dE 

dfi 


E T afi + U 

jeNi 


( 20 ) 


ins will also be satisfied by the mean field approximation (see Reif, 
(insemble averages of a binary process / (/,• E {0,1}) with respect to 
asure generated by the energy (16) at a temperature T — 1 //?r. This 
as follows: 


The mean field approximation is obtained by assuming that the local energy at 
node i, whicn is: 


Ei[f) = -/,[ E T afi + fi] = -fiHi(f) 

jeNi 


82 



can be approximated by: 


Ei « -fi[ £ Tiffi + U\ = -fiHiCf) 

jeNi 


where f 3 de lotes the ensemble average of f 3 . Since {fj} are constants for a given 
temperature we can compute /,* as: 


- S/^o,i/»exp[-g,-(7)/r] 

* £/,=o,i exp [-Hi{J)/T\ 


1 + exp [-Hi(f)/T] 

This means mat there is a fixed point of equation (18) that can be interpreted as an 
approximati on of the ensemble average of a corresponding binary MRF (note that 
in general tl is fixed point will not be unique, and will depend on the selection of 
the initial cc nditions; the lack of an adequate criterion for making this selection in 
the general < ase represents, at this point, a serious limitation of this approach). 

In the < ase of the posterior energy (4), if we require that / t - £ {0,1}, we can 
write it in tl e equivalent form (up to an additive constant): 

up(f )=-kE Mi - +«(2j, - m 

lQ i jeNi i i 0 


so that 


dUp 4 y-s . , v 2|iV t - 

Hi = = =rl f /j + a ( 2 ^' - 1 )- -7F~ 

ofi Tq j eN , Tq 


In this form one can construct directly the system (18), and defining the initial state 
as /J°) = u[ ^ = 0.5 for all t, find the stable fixed point that will approximate /. 
Since for a )inary system the MPM and TPM estimators are equivalent, we can 
approximate the optimal estimate by: 


> - c 


if fi < 0.5 
otherwise 


We have pe formed digital simulations of the system (18), and have found very 
good perfori lances for relatively high signal to noise ratios. For high error rates. 
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the behavio|of this approximation is similar to that of the MAP estimator. We will 
have to say more about this approach at the end of the next chapter. 


6 . Sirnultam 


T 


us Estimation of the Field and the Parameters. 


To app y the estimation procedures described in the previous sections, the 
parameters t lat characterize, both the prior model of the field (the natural temperature 
To), and the noise process, (the error rate e, or the variance a' 2 ) have to be known. 
In most pra tical cases, however, we are only given the noisy observations g and 
general qualitative information about the structure of the field and the noise, so that 
f,a (which |tands for either log[(l - e)/e] or a) and 7b have to be simultaneously 
estimated. 

In principle, one could use again a Bayesian approach, and assuming prior 


independent! 

respectively) 


The ma 
partition fun 


which makes 


uniform distributions for a and T 0 (in the ranges [a 0 , a 1 ] and [rg,rj], 
find those a, T 0 and f which jointly maximize the posterior distribution: 

|n difficulty here is the extraordinary computational complexity of the 
:tion: 

Z(T 0 ) = Eexp[—i-C/ 0 (/)] 

/ J o 

this approach impractical, except for very small lattices. 


An alternative approach is based on the following considerations (we will study 
in detail the case of a BSC; other noise models can be analyzed in a similar way): 


Equatio 
Impm depeii 


s (9) and (13), which describe the deterministic approximations to 
ild on the parameters of the system, e and T 0 , only through the product: 


7 = a7o = 7 q log 


( 21 ) 


which means 
single param 


that the behaviour of the algorithm is completely characterized by the 
;ter 7 . In the case of the Monte Carlo approximation, if we fix the 
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value of 7 , 
consistency 


with 


where 2 is 


This means 
in an adapti 
depends effc 

For a g 
/ using the 
process 2 ar 
rate e using 
will be: 

To mea 
(or "whitern 
variance of 1 

We co\ 
pixels wide) 


he value of To cannot be chosen arbitrarily, since it has to satisfy the 
ondition: 




e residual process defined as: 

Jl. if fi 7 ^ 9i 

Zi = < 

10 , otherwise 



that, given 7 , the correct value of T 0 can, in principle, be determined 
ve way, so that in this case too, the behaviour of the approximation 
ctivcly only on 7 . 

ven value of 7 , we can approximate the corresponding MPM estimate 
methods developed in the previous section, and compute the residual 
d the conditional (on 7 ) Maximum Likelihood Estimate of the error 
equations (22) and (23). The corresponding conditional estimate for T 0 


% = t (24) 

a 

jure the "likelihood" of the estimate /, we use the degree of uniformity 
ss") of the residual process 2 . This property can be quantified by the 
he local noise density, which we estimate as follows: 

er the lattice with a set {Sj} of m non-overlapping squares (say, 8 
For each square Sy, the relative variance of the noise density is: 
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with 


where \Sj\ is 


The des 


Pjl iesj 


the area of the j th square, 
red likelihood function is: 


m 


t-{f) = - E °i 

J =1 


( 26 ) 


which is equivalent to a x 2 criterion (Cramer, 1946) normalized to take into account 
the sample size. 

Alternat vely, one can use directly the likelihood that the residuals come from 
a uniform distribution. To compute it, we note that the quantities: 


= Y z i 

ies. 


are distributed according to the multinomial law: 


with 


where K is a 
(26) and (27) 
or when for 
adopt 


P(v • • -Vm) = ~j ~— 

u \\.. M m \\mJ 


1 


n = Ne = u\ + .. .i/. 


m 


Using the Stilling approximation we get the log-likelihood: 


m 


L{V\, . . V m ) = log P{V i, . . l/ m ) tssi - log + 

t=l 

+71 log fi log f] + K 
\mj 2 


(27) 


pnstant. We have found experimentally that both likelihood measures 
have a similar behavior when n is large. When n is relatively small, 
feme 2 , — 0, however, (26) is preferable, and so, it is the one we 
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Note that a more conventional likelihood function, such as the conditional 
likelihood ploposed by Besag (1972), will not work in this case; this function is 


defined as: 


£(/) = 


L\{f) + £2 (/) 


with 


i L k (f)= n PGi\7i,iZNi>To) = 

iec\ 

exp[-^- Ej6N,- V(fi, fj )]_ 

_ , U, exp(-Jr SygjVv VQi '/;)] + ex Pb jr Zjen V ( l - /•’4')1, 

= n (1 + « X P[^ E v 0i, 3y)ir‘. fc = 1.2 

i€6’ fc r 0 jeNi 

where the " odings" C { and C 2 are the sets: 

Ci = {i : (i t - is odd and y,- is even) or (z t - is even and y,- is odd)} 

C 2 =--{i : (a:* is odd and y t is odd) or (x t - is even and y»- is even)} 

with (x t *,y t ) denoting the row and column indices of site i (notice that, given the 
value of the field at the sites of Cu tbe random variables associated with any pair 
of sites of C 2 become independent, and viceversa). In our case, we find that as 7 
decreases, } becomes more and more uniform, while % remains almost constant. 
It is not difficult to see that as a result, the conditional likelihood L will decrease 
monotonically with 7 , which renders it useless for our purpose. 


The rai 
of systems c 

One ca 
we can use 
and a lowei 
get 70 = . 2 . 
MRF is bel 
and Snell, 1 
i), while fo 


ge of values [ 70 , 7 m] of the parameter 7 that corresponds to the class 
f interest can be determined as follows: 

1 show that for 7 > 8 we will always have /mpm* = to for ah so that 
7 M = 8 . The value of 70 can be obtained from an upper bound for e 
bound for T 0 . For example, assuming that e < .45 and T 0 > .5T C , we 
;. (Note that when the natural temperature T 0 of a first order, isotropic 
)w 0.5 times T c (the critical temperature of the lattice; see Kindermann 
980), the patterns become practically uniform (i.e., /, =constant for all 
• values of T 0 greater than 1.5T C , we get patterns that are practically 



indistinguisli 
practically aJ 


||ible from white noise. Therefore, the assumption T 0 > .5 T c covers 
the interesting cases). 


The complete estimation procedure is as follows: 


Maximum L 


1: Sample tm 


2: For each 


kelihood Estimation Algorithm: 


e interval [ 70 , im] at the points 


70 < 7l>---7n < 7 M 


7 £ Q — {7i> • • *7n} • 


2.1: Find f{i) usin S ( 9 ) and 


2.2: Colupute 2 using (23). 

Compute € using (22). If e = 0 , set L (/( 7 )) = -00 and proceed with the 
ue of 7 . Otherwise, compute a and go to 2.4. 


2.3: 

next 


vai 


2.4: Cd 


2.5: C 
3: Computd 


Upute r 0 using (24). 


(|mpute L(f( 7 )) using (25) and (26). 
the optimal estimate / using: 


The corresjonding e*,T 0 will be the optimal estimates for e and T 0 , respectively. 




/ = 7(7 ) : L(/(7 )) = sup l(/(7)) 

l€Q 


( 28 ) 


Remarks: 


1. This est 
the noisy < 
prior assun 
isotropic N 


mation algorithm allows us to reconstruct a binary pattern / from 
bservations g without having to adjust any free parameters. The only 
ptions correspond to the qualitative structure of the field / (first order, 
RF) and to the nature of the noise process, but neither the natural 
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!J**> is*.; 


w M n 


in! \yt\ y* 



Figure 9 . (a) 5 
complete scrie; 

temperature 
means that 1 
even if it ha 
algorithm to 
we show sue 
image (a) w 
shown in (c' 
shown in pa 


ynthctic image, (b) Noisy observations, (c) Maximum Likelihood Estimate, (d) A 
of estimates. The optimal estimate (for 7 = 2 . 9 ) is indicated by an arrow. 

T 0 nor the error rate e have to be known in advance. In practice, this 

/e can apply it to restore any binary image with uniform granularity, 

not been generated by a Markov random process. We have used this 

reconstruct a variety of binary images with excellent results. In figure 9 

l a restoration. The observations (b) were generated from the synthetic 

th an actual error rate of .35 (assumed unknown). The MLE for / is 

. A complete series of estimates /( 7 ), with 7 varying from .5 to 3.5 is 

tel (d). 


1 This proledure can be easily extended to handle any one-parameter noise 
corruption p -ocess (such as zero mean, additive white Gaussian noise). The extension 
to the case Df N-ary fields, i.e., to the restoration of piecewise constant images, 
is also straig itforward (using the general algorithm described in chapter 3 instead 
of (9) and (13) in step 2.1). As an example, we present in figure 10 the optimal 
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Figure II 
Optimal (Max 


restoration c 

3. We have 
a function < 
supremum i 
a coarse sar 
reduced int 
order of 15 

4. The who 
in parallel 1 

7. Formatio 






■ • •• • 






■v, 




(a) 

(b) 


S ■ 14• 


. (a) Original ternary MRF. (b) Noisy observations (additive Gaussian noise), (c) 
mum likelihood) estimate. ____ 

f a ternary pattern corrupted by additive white Gaussian noise. 

found that the likelihood function (26) is reasonably well behaved as 
if This permits us to perform the one-dimensional search for its 
i an economical way, by first determining its approximate location using 
ipling pattern, and then refining its position by a fine sampling of a 
■rval. In practice, it is possible to get very good results using on the 

samples. 

e procedure is highly distributed, so that it is possible to implement it 
ardware in a very efficient way. 

a of Perceptual Clusters. 


At the heart of a general purpose perceptual system, one must have a mechanism 
for deciding which parts of an image should be considered to "belong together 
(Marroquirl 1976). A simple instance of this problem is the grouping of dots in an 
image into Jerceptual clusters. Some heuristic schemes have been proposed to model 
this phenomenon (see for example, O’Callahan, 1974). We will show, however, how 


I 



















this problem 
as a particulaf 


;an be formulated in an elegant way that is also biologically motivated, 
• case of the reconstruction of binary patterns from noisy observations. 


The conceptual model for this formulation is as follows: 


Let us c msider the dots that form the original pattern as patches belonging to 


some objects 
this way, the 
these objects 


Suppos^ 
assume that 


of uniform color that are partially hidden, say, by some foliage. In 
formation of clusters is equivalent to the problem of reconstructing 
(whose cohesive nature is modeled by a first order MRF with Ising 
potentials) fif)m observations that are formed according with the following model: 

that fi = 1 only if an object overlaps the i th site of the lattice. We 

t e "foliage” will hide this point (i.e., make gi — 0 ) with probability 
irious values of g t - = 1 can appear in sites where /*• = 0 with a very 

ity p: 

with prob. (1 — e), if /,■ = 1 
with prob. e, if /* = 1 
with prob. (1 — p) } if fi = 0 
with prob. />, if /,• = 0 


fl, 

0 , 


Qi 


0 , 

U, 


with p < < |. The posterior energy is: 


Up{f;g) = W) + « E (!-*(!- «)) + 

-to i:fi=l 

+M (i -&{gi)) 



where U 0 (fi 


is given by (1) and (2): 

- 1 , 

V c (fi, fj) = ' 1, 

0 , 


if |i - j 1 = 1 and /,• = fj 
if \i - j\ == 1 and fi ^ fj 
otherwise 


Uo(f) = Y,V(fi,fj) ; 


6 and a. are 


defined in (5) and (6): 



if a — 0 
otherwise 
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its local vari 


where 


and z{ is de 
the residual 
that cover tl 


and M is a v *ry large number. 

The clui tering task is now equivalent to the problem of estimating / and the 
parameters a and T 0 from the noisy observations g. To accomplish this, we can use 
the method iescribed in the previous section, except that in this case, only those 
sites for whi :h /,• = 1 will be useful for the estimation of the residual density and 
i; nee. This means that equation (22) has to be modified to. 


-rag* 


A = {» : /< = 1} 

ned in (23). Also, the sets Sj used to compute the relative variance of 
density in (25) should now be taken as the intersection of the squares 
e lattice with the set A. 


With tl 
for clusteriil 
original dot 
of 7 — aTo 
that these 
and psycho 
to model hill 


8 . Discussid 


ese modifications, the Maximum Likelihood algorithm can be used 
g. Its performance is illustrated in figure (11) where we show, the 
pattern (upper left) and the recontructed objects for decreasing values 
The maximizer of the likelihood is marked with an arrow. We believe 
f reliminary results are encouraging, although, clearly, more numerical 
>] >hysical experiments are needed to assess the plausibility of this scheme 
man perceptual processes. 


In this 
constant fu 
to this pro 
of a gener 


n 


chapter we have addressed the problem of reconstructing piecewise 
ictions from noisy observations. We showed that the optimal solution 
I lem can be obtained from the observation of the equilibrium behavior 
ilized Ising net coupled with a spatially varying (but fixed in time) 


92 






Figure Ilf 
and die rccont 
(i.e., the optim 


Formation of perceptual clusters. We show: the original dot pattern (upper left) 

( ictcd objects for decreasing values of 7 = aT 0 . The maximum likelihood estimate 
1 clustering) is marked with an arrow. 


external field 
a criterion, 
MPM estimall 


• If we use the minimization of the expected segmentation error as 
e optimal estimate is the maximizer of the posterior marginals ( the 
[or which was described in chapter 3). 


th 


We com 
found that 
as the noise 
A consequen 
estimator ma 
advantageous 
binary signals 


(pared the relative performance of the MAP and MPM estimators, and 
moderate signal to noise ratios, they are practically equivalent, but 
increases, the MPM estimate is (sometimes dramatically) superior. 
:e of this analysis is that, if the noise level is not too high, the MAP 
be a reasonable choice in those cases where it is computationally 
This is the case, for example, of the reconstruction of one-dimensional 
where we derived a very efficient algorithm for its exact computation. 


fdr 


level 


v 


In the tv o-dimensional case, however, the situation is different: the general 
Monte Carlo procedure for the approximation of the MPM estimator is in fact more 
efficient, from a computational viewpoint, than the corresponding one for the MAP 
(Simulated A mealing), and in the case of binary fields, we derived a much faster 











deterministir scheme with excellent experimental performance. 

We alsc showed how these estimation procedures can be extended to the more 
interesting c ise where the parameters of the system are not known in advance. In 
this case, a n aximum likelihood estimation algorithm can be derived, which, using a 
likelihood fi nction that is computed from the residuals, allows for the simultaneous 
estimation o' the field and the parameters. 

We poi it out that although, for the sake of simplicity, we have concentrated 
on the case c f binary fields sent through binary symmetric channels, the results that 
we have pre lented can be generalized to N-ary fields and other noise models (see 
figure 10 ). 

The cor structions that we have presented can be applied not only to image 
segmentatioi and restoration, but to other problems as well. As an illustration, 
we presenter a novel application to the modeling of the process of formation of 
perceptual c Listers. Another important problem that can be formulated in this way 
is the recont uction of surfaces from stereoscopic pairs of images; we will discuss it 
in detail in c lapter 6. 


Appendix 4.A 


OPTIMALITY OF ALGORITHM Al 


In this albpendix we present a proof of the fact that the algorithm presented in 
section 4 of c lapter 4, effectively computes the MAP estimate for a one-dimensional, 
binary MRF 

The opt mality of the algorithm follows from the following propositions: 


Proposition 

suppose that 


detected by Al. 


Proof: 

Suppose 4 vj 
assume that 


|as detected by Al, and let L be the next boundary detected. We will 
7 ^ 4+i and arrive at a contradiction. We will consider three cases: 


Case 1: Supp 


Then, we mutt have that 


and there ford 


: Let S* = be the optimal boundary configuration, and 

jt, for k < n was detected by Al. Then, l k + 1 will be the next boundary 


bse Al detects Lai j < 4 +1 , 


U p (j) > U P (L) + U m (j) - U m (L ) + 2 


UWi,~.,h,L,j,l k+l ,...}) < U(S*) 


which is a co itradiction. 

Case 2: Supp )se Al detects L at j e (4+ 1 , 4 + 2 ]- 


This means tl 
particular. 


at at j we had that L was the optimal location for the boundary. In 


Pp(4+i) + U m (j) - U m {l M ) > U P (L) + U m (j) - U m (L) 
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which implies that 


^ p{L) + U rn (lk+ 2 ) U ni {L) < U p (l k+l ) + U m (l k+ 2 ) ~ ^m(^Jk+l) 
and therefoj e, 

U({h,...,l k ,L 9 l M ,...} < U(S*) 

which is a c mtradiction. 

Case 3: Sup Dose that Al has not detected any new boundary at j = l k+2 + 1. 
Then, we m 1 st have: 

UpiJk+2 + 1 ) < £4(4+2 + 1 ) + 1 

which mean 1 that 

U{{h,---l k ,lk+3>--} < U(S*) 
which is aga n a contradiction. 1 

Proposition l: If Al runs from left to right starting at a point / 0 , and generates 
the boundaiies {4,4,..}, then, lj £ S* (the set of boundaries of the optimal 
configuratioi) for j > 2 . 

Proof: 

Let / , f Ai b 2 the optimal configuration, and the one generated by Al, respectively. 
Let 

Lq = sup{j £ S* : j < li} 

L — inf{; £ S* : j > 4} 

If Lo — l Q , ve apply proposition 1 and finish the proof; so, let us assume that 
Lo 7 ^ lo, and that 4 was detected at i. We have two cases: 

Case 1: Lo ^ lo. We claim that in this case, 4 £ S* , and therefore, by proposition 
1, lj £ 5 foi j > 1 . To prove this claim, we consider two subcases: 

Case Ta: / ( Iq,Lq)) 7 ^ //u((4,Lo)). 


In this case, 


and therefo 


we have: 


2 4- U m [x) - U m (l j) 4 U p (l i) < U p (i) 


2 4- U m {i) - U m {h) 4- U p {h) - U p {Lo) < U p {i) - U p (Lo) 


which implils that h € S*. 


Case 1-b: / 
Suppose 1 1 


since othe 
this implies 


= fAi{{k,Lo)). 

S*. We have that, at location z, 

ih) 4- U m (i) - U m {h) 4- 2 < Up(Lo) + U m (i) - U m {Lo) + 2 
se, Iq would have been a better location for the boundary. However, 


U p {h) 4- U m (L) - U m (h) < U p (Lo) + U m (L) - U m (h) 

which meam that we can improve S* by moving Lq to h, which is a contradiction. 
Case 2: Lq < lo. 

Again, we cc nsider two subcases: 


Case 2-a: f* 


Lo,lo)) = fAi((Lo,lo)). 


Let U + , U- de the energy increments with respect to Lq: 


*4(0 = £ 

;'=Lo 


Note that 


U-{i) = E 

i=U 


u p {i) = £/+(*) - U+{1 0 ) and 
U m (i) = C/_(t) - U-{1 0 ) 
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Since /i wai detected at i, we have: 


j 2 + U m (i) - Um(li) + U p (h) < U p (i) 

and therefo e, 

2 + U.(i) - U-(li) + U + (h) < U+(i) 
which mean 5 that l\ E S*. 

Case 2-b: / ((Lo,/ 0 )) 7 ^ fAi{(Lo,h))- 

Using the s me definitions for U+, C/_, we have that, by the optimality of S\ for 
some j > L 

U-(j) - U-(L) + U + (L) + 2 < U + U) 

and therefoi e, 

C/_(y) - C/_(L) + U + (L) - U+ih) + 2 < U + {L) - u+(h) 

which mean 5 that if A1 detects l u it must detect L too, unless it detected l 2 first, 
but in this c ise we have that, for some p < 

U-{p) - U-(h) + U+(l 2 ) - U+(l 1 ) + 2 < U+(p) - U+(h) 
which impli :s that l 2 G S*. This completes the proof. 1 

It shoul d be clear that these results can be easily extended to the case where 
A1 runs bac cwards (from right to left). With this extension, we get the following 
complete op Jmal procedure: 

Algorithm A l: 

1: Run Al f om left to right. Detect {h,.. ., l n }. 

2: Run Al fc ackwards (starting from l 2 ). Get either 

•••>!«) or {/j } l 2 
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In either cas e, this is the optimal solution. 

The on y thing that remains to be proved is that the determination of the 
optimal loci tion for a boundary is in fact performed by step 2.1 of Al. We have 
the following: 


Proposition 3: Suppose that Al detected a boundary at (or started from) l 0 . Then, 
the optimal ocation l of the next boundary has to be updated only at places where 
si = —k ani l siml = k (note that in si we have stored the value of the maximum 
likelihood e ;timate f^ L , while siml = /{![). Suppose i is one such place. The 
optimal loca ion will be: 


t 


if U p (i - 1) - U m (i - 1) < U p i - U ml 
(the current value) otherwise 


Proof: 

First, we n( te that a necessary and sufficient condition for l to be the optimal 
location of tine boundary at the point i is that, for j £ [ l 0 , i — 1]: 


or equivalently, 


u p {i) + u m (i) - u m (i) < u p (j) + u m [i) - u m (j) 


u P (i) - u m (l) < Up(j) - U m (j) 


Suppose / v as the optimal location at i — 1 , and we process observation i. We 
consider sev ;ral cases: 

Case 1: sim = —k 

In this case, we show that l remains the optimal location: 

By construct on, we have that: 

U p (i - 1) = U p (i - 2) + *+*(?,-!) 

U m (i - 1) = U m (i - 2) + 
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Since siml =t — k we have that. 


’I'+tCsi-i) — 'f'-fc(fft-i) > 0 


and therefore. 


U p (i -1 


so that l red 


Case 2: simfl 


In this case we have that 


This means 
will be obtail 
by theorem 
be placed, it| 
suppose sim 

If 


U m (i 1) — C/ p (t — 2) — U m (i — 2) + i) > 

> U p (i - 2) - U m (i - 2) > U p (l) - U m (l) 
bins the optimal location. 


fiat 


then, 

because l wa^l 
token, it is cl 


= k 


U p (i - 1) - U m (i - 1) < U p (i - 2) - U m (i - 2) 


the minimal value for U p (i) - U m (i) on a block for which si — k 
led at the extremal point where si = -k and siml — k y and since, 
1 of appendix 4.B, this is the only point where a boundary might 
is sufficient to update the optimal location only at these points. So, 
= k and si = —k. 

Upt - U ml < Up{i - 1) - Um(i - 1), 


the new optimal location will be i —1.| 


U p i - U ml < U p (j) - U m {j) for j € [Zo, * - 1] 

the optimal location outside the last block where si — k. By the same 
ar that if 

U p i - U ml > U p {i - 1) - U m (i - 1), 
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Appendix 4.B 


DYNAMIC PROGRAMMING FORMULATION OF THE 
©NE-DIMENSIONAL MAP ESTIMATION PROBLEM 


In this 


which, based 
of one dimeii 


ppendix we present an algorithm for finding the global minimum of: 


Up = E V(U, f i+i ) + a * A («) 

»=1 t'=l 


( 1 ) 


|on dynamic programming principles, reduces the problem to a sequence 
sional optimizations. 


As we \*|ill see, this algorithm generates, as a byproduct, a family of solutions 
considered as descriptions of the field / at different scales, so that the 


the optimal d 


which can be 

coarse descri )tions, which are computed very fast, are progressively refined until 


A config 
L n defined bf 


Inest scale) configuration is found. 


This approach is based on the following idea: 

iiration / is completely characterized by the value of f u and the set 


We will call 
these boundat 
an equivalen 


he n elements of L n the "boundaries” of the configuration /. Since 
ies correspond to odd bonds between neighboring cells, we can define 
energy function as: 


For a fixed 4 
boundaries, t’ 


i n — {L : /l+ 1 } ; |L n | = n. 


( 2 ) 


C/(/) = n+-£/(/) 

with U(!) = E h € {^ 0 , * 1 } 


(3) 

(4) 


, U depends only on the value of / 1 , and on the position of the n 
at is, onn + l variables. To make this dependence more explicit, let 
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us define th 


functions 


Let Uq and L 
fi = k[ and 

We have tha 
U 0 (n, 


U\(n, 


and for n od< 

U 0 {n, L 


(Note that £ 

■ 

Let Sn\ 
Then, the opt 


G{L)= £(<*>*„(<?,)(5) 

; = 1 

denote the energy functions corresponding to the configurations with 
;o, respectively, for a given set of boundaries 

L.n (-Ll) • • -Ll ^ ^ Lfi (6) 

for n even, 

■») = n +f[]C E < Mi7y) + --- + E ^o(9 , ;)l = 

4 J—1 Li + 1 L„ + l 

= n + -[G(Li) - G(L2) 4 • • • — G(L n ) 4- $*ofe')] 

2 j= l 

«) = n + f IE E *ko(ffj) + • • - + E = 

z i=i Lt+i l„+i 

n 4 -[-C(L]) 4... 4 G(L n ) — G(N) 4- ^fcofaj)] ( 7 ) 

1 3 = 1 


►) — n + o [^(Zi) — <^(-^ 2 ) 4 ... 4 G(L n ) - G(N) 4 

3 =1 

^l(n, In) — n 4 - [-G(Li) 4 ... - (^(Ln) 4 *fcote)] ( 8 ) 

J J=1 

^jb 0 (gy) does not depend on /). 

3$ be the sets of boundaries that minimize U 0 and U u respectively, 
mal energy for a given n is: 

U* n = min[Uo(n, 5^), U Y {n, 5^)] (9) 
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We will defi le S n to be the corresponding optimal set of boundaries. 

The del xmination of SL** is an n-dimensional optimization problem. However, 
as we will show below, it is possible to decompose it into a sequence of one 
dimensional optimizations using a dynamic programming formulation. With this 
approach wt aJso get, as a bonus, the solutions s[ k \..k £ {0,1}, and 
the correspo iding optimal energies. If we set n = N, the solution to the original 
problem (3), U*(n*,S n ') can then be found by a one dimensional search. This 
strategy, hov ever, can be dramatically improved by the use of the following facts: 

(i) We can -educe substantially the search space for the location of the optimal 
boundai ies Lj £ S n *. 

(ii) The seq lences {U\, U* 3 , ...} and {U* 2 , U \,...} are unimodal. This, together 
with the fact that the dynamic programming algorithm uses Sy_t to compute 
Sj prov; des us with an efficient stopping criterion for the computation of 
the seqi ence {St,..S n -}. 

(iii) The exp xted value of n is usually small. 

We will now describe the algorithm, and analyze each one of these facts. 

1. Search Sp ice for the Optimal Boundaries. 

Let 

Pm {^Ti> M2 , • • •} ==: 

= { i : G(j - 1) < G(j) > G(j + 1), with G(j - 1) ^ G(j + 1)} (10) 

Pm — {mi,m 2 ,...} = 

= { h G(j - 1) > G(j) < G[j + 1), with G(j - 1) ^ G(j + 1)} (11) 

(Convention* lly we include j = 1 in if 0 < G(l) > G{ 2), and include it in P m 

if 0 > G(l) <: C(2)). We define the set P as 

P ~ Pm\J Pm — {-^lj • • •) TV} 

(Note that P corresponds to the set of places where the sequence {$k 0 {9j) [9j)} 

changes sign) 
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In what 


follows, we will call the elements of P Mt P rn and P % the "maxima”, 


minima", and "critical points" of <7, respectively. 


Let S„- 


whose cor re: 




then, 


Theorem 1: 
with probabil 


% 


(5 n *_) denote the subsets of S n - formed by those boundaries Ly 
onding term G(Lj) has positive (negative) coefficient in £/*., i.e., if 


Sn*+ — (f'l + fcj L\l+kt • • •} 


S n- = 5 n* ~ S n' + 


( 12 ) 


With thjse definitions, we have: 


|uppose that <&k 0 {9j) - $^( 0 ;) ^ 0, for all j (a situation that will occur 
ity 1 for most observation models). Then, 5 n - + C P m and 5 n *_ C P M . 


To see \|hy this is true, let f ML denote the maximum likelihood estimate for 
/ obtained 


f? L =\ k " 

lo. 


if **■(»>) > **.(</,-) 

otherwise 


and let /* be the optimal estimate. Suppose that for some j we have, say, Ly E 
£„•+ - P m - Suppose Ly £ (P k} Pfc+i), for some P k} P k + 1 6 P. Clearly, either P k E P m 
1. Suppose, for definiteness that P k e P m - 


or P k+1 e P, 
If Pkt 

than S n - (we| 
then either 


jp n -, the configuration {Li,...Ly_ 1 ,P*,Ly+ 1 ,...L n *} has lower energy 
decrease U without altering n), which is a contradiction. If P k e S n % 

or f\(L v P k+l )) f ML ((L jt P i+l )) 


and so, we get a lower energy configuration by deleting Ly and either P k or P k+l (we 
decrease simi ltaneously n and U). A similar argument can be used if Ly E [l,Pt) 
or Ly E (Pr,/ r ].l 
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This re mlt means that we can use P to constrain the search space for the 
boundaries of each subproblem (i.e., for each fixed n), which now becomes: 

For n <: | P | fixed, find S n = {L \,.. .L n } with 

S n + C P m and S n _ C P M (13) 

such that U( a, S n ) < U(n, L n ) for all L n C P. 

Note th it theorem 1 guarantees that the constrained and unconstrained solutions 
will coincide only for n = n*, so that for n 7 ^ n\ S n may, in general, be suboptimal. 

2. Dynamic Programming (DP) Algorithm. 

From e< [uations (7) and ( 8 ), it is clear that, for any fixed n, the determination 
of the optim ll (constrained) configurations sL 0) , sL 1} is equivalent to the solution of 
the optimiza :ion problems: 

For St? 1 : 

Minimize [G(Li) - Gfo) + ...] 
with L\ t L 3 ,... G Pm* snd L 21 L 4 ,... G Pm . 

For sL 11 : 

Maximize [G(Li) - G{bi) + ...] 
with L u Z*,... G Pm* and U* L At ... G P m . 

Let us c Dnsider the maximization problems. Assume, for definiteness that the 
first critical 1 oint of G is a maximum, i.e., Mi < mu and define the sequences: 

Z>i(A:) = sup G(Mi) 
i>k 

Li{k) = (min L : G(M L ) = Di(k)}, k = 1 .. .\P M \ (14) 

Clearly, M Ll ^ is the optimal location of the boundary for n = 1 (i.e., 
= {m Li !)}), and from £h(l) we can easily compute the corresponding energy. 
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We now den 


and 


for k = 1, 


and the optimal energy is: 


For n even, we define: 


D 


and get: 


ne, for j > l: 


D 2 j{k) = sup{D 2 y_i(z + 1) - <7(m t )} 

t>Jfc 

£> 2 j+\(k) = sup{Z>2i(0 + G(Mi)} 

t>k 


£ 2 j{k) = {min L : D 2j {k) = D 2 j-i{L +1) - G(m L )} 
I< 2 j+i(k) = {min L : D 2j+i {k) = D 2j (L) 4- G(M L )} 

, \Pm | — j. One can check that, for n odd. 


^l( n ) = n + f [—Z>n(l) + £ 

2 y 


£>!(/:) = sup{-G(m,)} , A; = 1,..|£ 

i>k 


m 


Li[k) — {minL : D’^/c) = — G(mi)} 

= suppjy-^i) + G(M,)} , k — — y + 1 

L 2j (k) = {min L : D 2] {k) = D 2l ^(L) + G(M L )} 

D 2j+ 1 3up{Djy(i + 1) - G(m,)} , k = 1, . . .\P m \ - j 

i>k 

I^2j+ 1 W = {min L : D 2j+1 {k) = D 2] - G(mj-)} 

• s n ) = 

^l(n) = » + f [-^(1) - G(N) + £ •*,(#)] 
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For the mini 

that Mi < 7? 


and for j > 


with k varyi 


For n odd: 


with the com 


The case for 

The recu 
allow us to o 


nization problems, that is, for the computation of sL 0) , assuming again 
i, we have, for n even: 

d ii k ) = 

t>* 

h [k) — {minl : d[(k) = —G(mi)} 


d 2j (k) = .inf {d 2j -i{i) + G(M{)} 
kj{ k ) = {min l : d 2j (k) = d 2 y-i(0 + £(M/)} 
d 2 j+i( k ) = mf (rf 2 y(* + 1) - G(m»)} 

* 2 y+l(&) = {min l : d 2j+l (k) = <f 2 y(* + 1) - G(m/)} (21) 

5 in the appropriate range. The solutions are: 

5 « 0) = { M Mi)’-*-’ m MM---(Mi)M} 

U 0 (n) = + |[rf„(l) + £ ^jko(^y)] (22) 


d\{k) = inf {G(M t )} 

t>k 

d 2 j( k ) = mf {d 2 y_i(* + 1) — ^(m t )} 

4+1W = W {4(0 + G (M t )} (23) 


sponding definitions for /’■(&). The solutions are: 


5 " 0) — ( M L(i) . 


Uo(n) = n + | [<4(1) - G(iV) + £ *i.(9j)] (24) 

y 

/hich 777i < Mi is treated in a similar way. 

•sions (15), (18), (21) and (23), together with equations (9) and (10), 
•mpute the sequences {Si,S 2 ,...} and {U\,U* 2} ...} using only one 
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dimensional 
value n for 

3. Stopping < 

In this < 

Theorem 2.! 

{5,,5 2 ,...} 
U* n ) and tha 

This res 
programmin 
minima for 
we can term 

To prov 

Lemma 1. 1 
boundaries ( 
k + 2 , respe< 
refinement o 

Proof: 

We will 
We consider 

Case 1: Supf 

In this case , 1 


optimizations. We now turn to the problem of determining the optimal 
the number of boundaries. 

Criterion. 

ection we prove the following: 

luppose that every (constrained) optimal configuration in the sequence 
s unique (i.e., for every n, if S’ n ^ S Tl , and S' n C />, then U(n,S' n ) > 
for some n, U* n+2 > U* n . Then, U* n + 2k > U * n , for all k > 1 . 

jit will provide us with an efficient stopping criterion for the dynamic 
; recursions described in the previous section; since the first local 
he subsequences {U\ } U * 3 ,...} and {t/J, U \,...} are the global ones, 
nate the computations once we have found them. 

i the theorem, we will need the following lemmas: 

.et S* = {L \,.. ,,L k } and S k+2 = {L \,.. >L k+2 } be the optimal 
vith corresponding configurations f k and f k + 2 ) for n = k and n = 
tively. Suppose that k + 2 < \P\. Then, S k C S k+2 (i.e., S k+2 is a 
' St t), provided S k is unique. 

assume that for some j, Lj £S k - S k+2 , and arrive at a contradiction, 
three cases: 

ose that for some *, 

[44 + iiri5t=0 

i/e claim that we can find some index p such that 

[44+i]f|5t = 0 
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and 


Supposf 


fk+2{{Lp, L p + { ]) 7 ^ f k ((L p ,L p+l ]) 


Supposl that this is not the case. Then, are the only elements of S k+2 

in some inte rval (Ly, Ly +1 ) (or in one of the extreme intervals [ 1 , Li),(L k , N]) and 


/fc+2((f' t > A+i]) 7^ 1]) 


By condition 


[L{, T t ‘ + [] G (Lj f L/j+ 1) 

(13), we have that Ly 7 ^ L i _ l (otherwise, Ly would be a local maximum 
and minimiln of G at the same time). But then, since S k is optimal, we can find 
a configurat on with k + 2 boundaries whose energy is lower than that of S k+2 , 
by moving 1 ’• to Ly (or L ^ +1 to Ly +1 ), which contradicts the optimality of S k +2- A 
similar argui lent holds if 


This proves 


pur claim. 


So, supi ose that 


and 


Form 


and let f k b f 
/*( 1 ) (and th 


or [Lk,N] 


Let A U 




fk+2{[L p ,L p+ 1 ]) 7 ^ fk((L p ,L p + 1 ]). 


^ k {-^li • • •» ^p—l) L p + 2> • • •> f'jfe-f 2 } 


the corresponding configuration, chosen in such a way that f[( 1 ) 
before, = fk([L' P , L' P+ i)))- 


be the change in U (see eq. (4)) associated with setting: 
f([ L ’ P > l 'p+i}) = fk+2{[L p , L p+l ]). 
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We hav| that 


Now, we pu 


Since S* is optimal, we have that: 


which contradicts the optimality of S*+ 2 . 


Case 2: 


Suppose that 


Otherwise, if] 
are in case 1 
configuration 

So, 

By a simj 


Now, proceeding as in case 1, we form: 


and let /* bq 
fk( 1) 


U{S k+2 ) = U(s' k ) + A fir. 


^k+2 1» * * •> Lj, Lp t Z/*}. 


A A 


C/(S* +2 ) = C/(S*) + AC/ > C/(S*) + AC/ = U(S M ), 


ai^;]Uft + 2.^)na-i 

Lj € [l, Z^). We must have 

A+2([l> Z/jJ) 7^ /&([l>Zq)) 

Li = L'z , condition (13) generates a contradiction; if Li > Z^, we 
and if L\ < L' 2 , S* +2 is not optimal, since we get a lower energy 
by moving L\ to L\. 

fk+ 2([l>Z'i]) 7^ /*([1, Lj]). 

lar argument, we get that 

fk+2([L k +2t N]) 7 ^ fk{[Lk+2) N]). 


S* {Z^ 2 » ♦ • •> Z/jfc+i} 


the corresponding configuration, chosen in such a way that /*(1) = 
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Let A U be the change in U associated with setting: 

f{[L\tL 2 ]) = fk+ 2 {[Li, L 2 ]) and 
f({Lk+i>Lk+ 2 ]) = fk+ 2 {[Lk+\, L k+2 ]) 


so that 

U(S M ) = &{S' k ) + A U. 

Now, we for n: 

*^jfc+2 == {^i> ^i» • • •» Lk, L k+2 }, 

Since S k is c Dtimal, we have that: 

U(S k+2 ) = U(S k ) + At/ > t/(S*) + At/ = f>(5^ 2 ), 
which again :ontradicts the optimality of S k + 2 - 

Case 3: 

For all i, [4,L- +1 ]f|5/k^0, 

and ([1, Li] (J[L k+2f N]) f) ^ 0 (*) 

To make (*) hold, we must be able to place k boundaries in it + 3 (ovelapping) 
closed interv; Is, without omitting any interval. Moreover, since condition (13) must 
hold, we can lot put Ly = L\ and Ly+i = L ' i+2 for any i,j. But this is impossible; 
so, our proof is finished. 1 

Lemma 2. Le At/* = U{S k )-U(S k+2 ). Then, At/* < A&*_ 2 , for all k £ [3, \P\—2 ). 
Proof: 

Considei the optimal configurations S k , S k+2 , S* +4 , and suppose that A U k+2 > 
A U k . Using bmma 1, let 

Sk ==: {Lj,.. L k }) 

^+2 {^1 ) • • •» Lj, L 2 1 • • v Ijt} • 


By condition (13) and lemma 1, there are only two valid forms for S* +4 . We 
consider eacf case separately: 


Case 1: S* +4 is of the form: 


i (Z>i> • • •> Lp, Z/j, L 2j Lp- )- 1 ...) Zj, L 2 ;.. 


(i.e M the refinements corresponding to S k+2 and S k+i are disjoint). 


Then, f< 


> - ti it , 

^ k +2 (TI, . . Z/p, Lp+i ,. . Z/j , Z/2 j . • ./> 


we have 


(^ +2 ) = U(S k ) - AU k+2 < U(S k ) - AU k = U(S k+2 ), 


which is a contradiction. 


Case 2: Sjt+ 4 is of the form: 


^k+4 {L u ...,L jt L lt L lt L 2 t L it ...} 


(i.e., S k +4 is | subrefinement of the refinement introduced by S k + 2 ). 


a U({L U . . Lj, L±, , Lj+x ,.. T U{S k ) 


We have tha 


b — U{{L \,..., Lj, Li, L 2 , Lj+i ,...} — U(S k ) 
c U({L h .. Zy, L 2 , L 2 , Lj+i, ...)■ + £/(£*) 


AU k = a + c — b 

&U k+2 — 6 . 


By assumpti 


: I 


b > a + c — 6 
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and therefor 


Now, let S' k 


Then, 


AU k = a + c — b < —— < max(a, c). 

2 

2 be formed from S k by the refinement: 




if a = max(a, c) 
if c — max(a, c) 


m 


IS' M ) = U(Sk) - max(a,c) < U(S k ) - A U k = U(S k+2 ), 


which is a contradiction.! 


Now we prove theorem 2: 
Suppose U * k+2 > U* k . Then, 


k + 2 + ^(S* +2 ) > k + ^(S*) 


now, by lemma 2 we have: 


'+4 = k + 4 + |a(S t+4 ) = k + 4 + | {U{S k ) - A U k+2 ) > 
>k + 2 + |(£(S*) - A U t+2 ) >k + 2 + |(t>( 5 t ) - A U k ) = 

= k + 2 + ^u(s k+2 ) = u ‘ k+2 i 


4. Expected ¥alue of n\ 


First, we compute the (prior) probability density function p(n) for the number 
n of odd bon is in the original field /. 

Let Nb = = N — 1 be the total number of bonds. We can rewrite equation (1) 


P(u = /) = 


( 25 ) 
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The touf number of configurations compatible with a given n is 2 C„\ and so, 

2C?,r>exp[l(jV t -2n)] _ 

E* W i„C^cxp[i(N t -2fc)] 


which is a bi 


homial distribution. Therefore, 


We note th 
exponentially 
too high, we 


An inte 
each of the 


= c nJ e‘/° \ N '-r e-'/° \ n 

n V e i/« + e-i/aj Ve l / a + e -1 / a / 


(26) 


n) = nJ ----- 1/0 ) 

Ve'/“ + e _, / a / 
■[„] = at/ -- -) 


u „ / ( 27 ) 

e l/a + e ~l faj 

as a | oo, E[n] f Nb/ 2, and as a | 0, E[n\ | 0 (and var[n] | 0) 
fast. This means that if the natural temperature of the system is not 
lean expect that n\ the MAP estimate for n, to be relatively small. 




5. Relation to Multiscale Filtering. 


If we in 
a family of 1 
formulate thii 
"continuous 


bsting characteristic of the DP formulation is that the solutions to 
ubproblems (which in fact correspond to a minimization of U (eq. 
(4)) are independent of the value of the parameter a. The role of this parameter 
is to determine the number of regions (n*) that will be present in the optimal 
In this sense, it can be regarded as a "scale" parameter that controls 
the aggregation of the subregions into larger units, and the algorithm can be used to 
produce muljiscale descriptions (in the style of the "fingerprints" treated in Witkin 
iille and Poggio (1983)) of the input signals. (Several other heuristic 
solutions to this problem have been proposed. See, for example, Blumenthal et al., 
1977; Prazdn r, 1982 and Pavlidis, 1973) 


erpret the algorithm in this way, it becomes natural to ask whether 
pear operators can do the same job in a much efficient way. Let us 
question in more precise form (in what follows, we will consider a 
time" problem obtained from the original one as a limit when N | oo 
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(provided th< t the observations are different from 0 only in a finite interval), since 
it simplifies tie notation. It should be clear that the same arguments apply to the 
discrete case] 

Conside a family of filters {F L } with the following properties: 

(i) Each Fi\x) is a symmetric and non-negative function of x. 

(ii) For each L, F L (x) is a decreasing function of |x|, and F L {x) [ 0 as |x| | oo 
fast eno igh, so that F L can be approximated by a function with finite 
support. 

(iii) All the f Iters are normalized: 


r°° 

/ Fi(x)dx = 1, for all L. 

J — oo 


(iv) The filters become sharper as L j 0: 


J 0 Fui x ) dx < j Q Fu( x ) dx 


implies that Lq > L\ 


Particular examples of acceptable families are: 
(i) The family of rectangular boxes Bi. 


Bi(x) = {i- 


if |x| < L 
otherwise 


(ii) The family of Gaussian Kernels: 




Suppose we convolve the function g(x) - \ (g(x) is a continuous time 
approximation to the observations) with a set of filters from the family {F L }. 
If we start wifh L large enough, the function 


h L = (9-\) * F L 
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will be practically constant, and therefore, it will have no zeroes. As we decrease L, 
zero crossing] of h {j will begin to appear. To each of these zero crossings, we will 
associate a bq rndary, and form the configurations Si, S 2 , .. . with 1 , 2 ,... boundaries 
respectively, 1 hat correspond to the first, first two, etc. zero crossings of h L (we are 
ignoring, at t! lis point, the question of the precise localization of these boundaries. 
With additional contraints on the family {F/J, it is possible, in principle, to localize 
them by dec easing L in a continuous fashion, and then tracing the position of 
each zero cro ssing to the finest (L = 0) level; see Yuille and Poggio (1983). For 
the moment, let us assume that we can identify the zero crossings of g - \ that 
correspond tc those of hi, for all L). 

The ques tion that we ask is the following: 

If S\,S 2 ... are the optimal boundary configurations produced by the DP 
algorithm's it true that 

Sk = St 

for all kl 

As we now show, this is not the case. 

Consider the signal g(x) defined by: 

g(x) = 1 , 

for x G [/ 1 , l\ + 2a] U[^ 2 ? h + 26] (J [^2 *F 46 ,12 + 66] (J 
(J[T + 86 ,12 + 106] ^J[^2 + 126 ,12 4-146] (J[^2 T 166 ,12 186] , 

and g(x) = 0, otherwise. Here, luhyo- and 6 are some positive numbers chosen in 
such a way tl at, if Lo is the starting L, we take l 2 - h - a >> Lq, so that, by 
property (ii), there is no interaction between [hJi + a] and [l 2 , h + 186] (see figure 
4.B.1). 

Suppose |hat the zero crossings corresponding to [li,li + a] appear first (as a 
single double zero) at L = L\, and those corresponding to [^ 2 , h + 186] at L — I 2 . 
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Figure 4.B.I. (See text). 


Then, 


J Q F Ll (x)dx = J* F l ,( x)dx 

rb f56 rQb 

/„ F U x )dx + J 3i F u (x)dx + J n F u (x)dx = 

r$b * 7 b *00 

= J b F u( x )dx + J si F Ll (x)dx + J u F L ,(x)dx 


Now, for a > 6, we have: 


U({h,k}) = 10b 


U({k,U}) = Sb + 2a > U{{l h l 2 }) 
and therefore, S 2 = {/ 1 , ^ 2 }. 

We claim that we can find some a, b with a > b such that 

J 0 F L,( x )dx < J a F L,(x)dx 


(28) 


(29) 


If this is true, we find, using (28) and conditions (iii) and (iv), that it implies that 
L 2 > L u and therefore, S 2 = {l 3 , U}. 
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We now prove our claim: 


Let a = b + §, where we choose e so that 




(property (ii) guarantees that we can find such e). From (29), 



and from (30), 


roo /-oo *oo rb+e/2 

L F ^i x ) dx = L /2 f lA x ) (1x = J b F La (x)dx — J F,Jx)dx = 


'b+e/2 



This result does not mean, of course, that families of linear filters cannot be 
used for producing useful multiscale descriptions of signals; it only means that these 
descriptions cannot, in general, be considered as MAP estimates of MRF models. 


6. Continuous Valued Fields. 


In this section we present a related problem which can, in principle, be solved 
using the DP approach, although, as we will see, in a less efficient way. 


Let us consider the problem of estimating a piecewise constant signal corrupted 
by additive white Gaussian noise. We model the signal {/,} as a MRF with potential 


y{fu /.•+!) = { h 
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if /* == fi+l 
otherwise 


(31) 



and global states distributed according to: 


P(F = /) = 



E V(f it f i+1 )} 
*=1 


The observations are given by: 


9i = fi + n * 


where n is a white Gaussian process. The Bayesian (MAP) estimate for / is again 
found by minimizing eq.(4): 

u{f) ^n+lu 

U = £(/.- - gi f 

i+1 


where n is the number of places where /,• ^ f i+u and 7 == Note that in this 
case, fi is not restricted to { 0 , 1 }, but can take any real value. 

Proceeding as we did in section 2, we consider the sequence of subproblems 
obtained by putting n = 0 , 1 , 2 ,.... 

For any fixed n, U will depend only on the n integer variables that correspond 
to the location of the boundaries between regions of constant /, since given these 
boundaries L = {L\,...L n } y the optimal estimate for / on any interval (L,-,Z* + 1 ] 
(we put Lq = 1 and L n+l = N) is: 


/((A>£*+iD = j - y S 9y 

L *+1 “ L i j=Li+l 


If we define G k) i (for k < l) as: 


G k ,i = (1 - 2(1 - *)) 


ih s «y 

1 K i=k+i j 



We get that: 


N n+1 

U(Ln) — £ 9i + GLi-uL,- 

*'= 1 j=l 



119 




(note that £ 0 ? is a constant for a given set of observations). Using dynamic 
programming principles, we can now write the recursions: 

F 0 {k) — Gk,N > k = 0,.. .N — 1 

Fj+i(k) = , k = 0, .. .N — j — 1 

L j+l (k) = {L : G jb , L + F i (L) = F y+1 (fc)} (34) 

The optimal solution, for each given n is: 

S n == (L n (0), ^n-l(-i^n(0)), . . Li(Z^(. . .(L n (0)).. .)} 

and the corresponding energy, 

C/(n,5 n ) = n + f[£,? + J P n ( 0)] (35) 

z «=l 

The solution to our problem will be S n *, where: 

c/ ( ri *)‘ s 'n*) = inf{l/(n,5 n )} (36) 

Unfortunately, in this case we cannot guarantee the unimodality of any subsequence 
of {U{S n )} (although we believe that the sequence will be unimodal in many cases) 
and so, (36) has to be computed, in principle, by an exhaustive one dimensional 
search. Another unpleasantness is that, unlike the binary case, the search space for 
the variables L t - cannot be reduced in any obvious way. 
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Appendix 4.C 


CONSISTENCY CONDITION FOR THE MPM ESTIMATOR 


In this appendix we present a proof for the consistency condition (given 
by equation (11) of chapter 4) satisfied by the MPM estimator of a binary, 
two-dimensional Ising net: 

Theorem: Let P(f, g) be the posterior distribution corresponding to the estimation 
of the first order, binary MRF / from the observations g which are obtained as the 
ouput of a binary symmetric channel: 

P(f, ?) = 4 exp[- E V Vi, fi) - 7 D 1 - Hfi - ?.))] 

ij i 

Let / be the MPM estimator for /. Then, for every site t, 

E. P(f,9)> £ Ptf.g) 

implies that: 

E. P(f,f) > E. P(f,f) 

f'fi—fi f'fi 7 *^ /( 


Proof: 

Let: 
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1) We first prove that for all i: 


£P(/ (i) ,g) > EP[h®g) 

f f 


implies that: 

E p (/ W .» W )>E^ W ^ W ) • 

/ / 

Suppose that g ^ g® (otherwise, the above is obviously true). For any fixed /, we 
have that: 

P(f { '\ g) - P(h {,) , g) = 


= /C{exp[- £ y(fi, fj) - 7 ] - ex P [ E V (1 

j<EN, 


and 


i>( / W ) gW)_p( A , . ) iff (.)) = 

= K{exp[- Y. VQi, /y)] - exp[ Y V(1 - /y) - 7 ]} 


Where K is a constant. Since 7 > 0 , this implies that: 

P(f {, \ g) - P(h®, 9 ) < p (/ ( ’>, 9 W) - p(fcW ( ffW) 

so that 

Ef(/ ,il . 9 , ‘ ) )-f(A (, lj(«)) > YP(f { ' ] ,g)-P(^,g) > 0 

/ / 


2) Let r, == 1 — / t -. We now prove that if: 


then. 


for all j. 


E P(f,g)> E P(f.g) 

E P(f,g {i) )> E ?(/.9 w ) 

/:/,=/< f'fi=n 
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For i = j, part (1) applies, and for pM = g, the assertion is obviously true, so 
suppose i ?£ j and ^ g. We have: 

= AT^expI- Y, V(fi, fj)\ ~ «tp[ E ^t 1 ~ fj) ~ 7 ]} 

ieNi jeNt 

P{f {i) , 9 U) ) - P(h (, l g M ) = tfi{exp(- £ Hfi, /,)] ~ 

j'e/v, 

ex P[ E v ( l ~ /i. fj) ~ 7 ]} ex p[—7(1 - 2 (fj - 9v )) 2 ] > 

j€Nt 

for some constant K\ % so that 

E® M ) - P(A«, E ^(/ w . ?) - P(h {i) , 9) > 0 

/ / 

The theorem is now proved by assuming that 

E P(f,9)> E JUs) 

/:/<=/< AM/, 

and succesively replacing gi by for * = 1,2,.. .and using (1) and (2) to show that 
the corresponding inequalities hold at each step. 
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Chapter 5 


RECONSTRUCTION OF PIECEWISE CONTINUOUS FUNCTIONS 


1. Introduction. 


In this chapter we will illustrate the application of local spatial interaction 
models and estimation techniques that we have described to the solution of the 
general reconstruction problem that we introduced in chapter 1. To make this 
discussion more specific, we will consider a particular instance of this problem: the 
reconstruction of piecewise continuous functions from noisy observations taken at 
sparse locations. 

In this reconstruction, it will be important not only to interpolate smooth 
patches over uniform regions, but to locate and preserve the discontinuities that 
bound these regions, since very often they are the most important parts of the 
function. They may represent object boundaries in vision problems (such as image 
segmentation; depth from stereo; shape from shading; structure from motion, etc.); 
geological faults in geophysical information processing, etc. 

The most successful approaches to this problem (see Terzopoulos (1984)) consist 
of, first, interpolating an everywhere smooth function over the whole domain; then, 
applying some kind of discontinuity detector (followed by a thresholding operation) 
to try to find the significant boundaries, and finally, to re-interpolate smooth patches 
over the continuous subregions. 

The results that have been obtained with this technique, however, are not 
completely satisfactory. The main problem is that the task of the discontinuity detector 
is hindered by the previous smooth interpolation operation. This becomes critical 
when the observations are sparsely located, since in this case, the discontinuities 
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may be smeared in the interpolation phase to such a degree that it may become 
impossible to recover them in the detection phase. 

One way around this difficulty is to perform the boundary detection and 
interpolation tasks at the same time. In the method we will present, this is done 
by using a Bayesian approach, and including in the posterior distribution our prior 
knowledge about the smoothness of the function and about the geometry of the 
discontinuities, as well as the information provided by the observations. Before 
describing how this is done, let us formulate the problem in a more precise way. 

Consider a region 0 of the plane which is formed by a number of subregions 
separated by boundaries which are known to be piecewise smooth curves. Suppose 
that within each of these subregions, some property / (in what follows, we will refer 
to / as "depth’*) varies in a smooth fashion, presenting, at the same time, abrupt 
jumps across most of the boundaries. Suppose also that we have measurements for 
the values of / at some discrete set of sites 5; these measurements will, in general, 
be corrupted by some form of noise. 

Our problem is then to estimate the values of / on some finite lattice of points 
L C ft, and to find the position of the boundaries, using all the available information 
in an optimal way. 

2. Posterior Distribution. 

To apply the general recontruction algorithms developed in chapter 3 to this 
problem, we need to cast it in probabilistic terms. The main issue here is the 
representation of the concept of "piecewise continuity" in the form of a prior Gibbs 
distribution in a meaningful way. 

This could be done, for example, by modeling the function as a first order, 
continuous valued MRF with nearest neighbor potentials given by: 

[(/. - fj) 2 , if |/.- - fj | < a and \i -j| = l 
v Ui, fj) = 1 4, if \fi - fj\ > a and |t - j\ = 1 

to, otherwise 
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where a and 6 are positive constants such that 6 > a 2 , and for every pair of 
neighboring sites i,j , |/ t - - f 3 \ < a if both i and j lie in the same smooth patch, 
and \ fi - fj | > a, otherwise. 

This scheme, however, has the disadvantage of not allowing for the explicit 
modeling of prior knowledge about the geometry of the curves that bound the 
smooth patches (the fact that they should be piecewise smooth curves, for example). 
A more flexible construction involves the use of two coupled MRF models: one to 
represent the function (the ’’surface") itself, and another to model the curves where 
the field is discontinuous. A coupled model of this kind was first used by Geman 
and Geman (1984) in the context of the restoration of piecewise constant images. 
We will now describe their work in detail, and define a related model that can be 
used for our problem. 

2.1. Coupled Line and Depth Models. 


In Geman and Geman’s work, the intensity of the images is modeled using a 
first order MRF with generalized Ising potentials (see chapter 4). The boundaries 
between constant regions are modeled using a "line process" l, which is a MRF 
whose associated random variables are located at the sites of the dual lattice of 
lines that connect the sites of the original intensity lattice (see figure 12). These 
variables may be binary (indicating the presence or absence of a boundary between 
two pixels), or may take more values to indicate the orientation of the boundary as 
well. In both cases, their function is to decouple adjacent pixels, reducing the total 
energy if the intensities of these pixels are different 

This is done by modifying the prior energy function; the new expression is: 


where 


Uo(f, o = £ £ Vf(fi, la) + £ v c ,(i) 

i j'eN, Cl 




if kj is "on" 
otherwise 
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OXOXOXO* Oxoxo 

x x x x x x x 

oxoxoxoxoxoxo 

X X X X X X X 

oxoxoxoxoxoxo 

X X X X X X X 

oxoxoxoxoxoxo 

X X X X X X X 

OXOXOXOXOXOXO 


Figure 12. Dual lattice of line elements (sites denoted by x) 


0X0 
x x 
0x0 


X o x 


X 

o 


X 


(a) 


(b) 


Figure 13. (a) Cliques for the line process used by Geman and Gcman. (b) Additional cliques 
used to prevent sharp turns. 


V is defined in equation (1) of chapter 4: 
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f-1, if |t - j\ = l and /,• = / y 

v (fi> fj) = 11. if |* — j'| == 1 and /, ^ / y 

lo, otherwise 

kj is the line element between sites i and j, and the line potentials V Cl have as 
supports cliques of size 4, such as the one shown in Fig. 13-a. Every line element 
(except at the boundaries of the lattice) belongs to 2 such cliques. The values of 
the potentials associated with each possible configuration of lines within a clique 
must be specified. Thus, for example, if straight horizontal and vertical boundaries 
are likely to be present, a binary process, with potential values as those of Fig. 
14 is used (rotational invariance is assumed). In more general situations (such as 
piecewise smooth boundaries), we may use different values for the potentials, or we 
may allow more states for the line elements, corresponding to different orientations, 
augmenting consequently the table of values for the potentials. 

2.2. Models for Piecewise Continuous Functions. 

The model we have described can be adapted to our problem by modifying 
the choice of the potentials and the neighborhood structure of the coupled MRFs. 
Specifically, the following modifications are needed: 

1. Since in our case the observations are sparse, it becomes necessary to expand 
the size of the neighborhoods of the line field, to prevent the formation of "thick” 
boundaries between the smooth patches (i.e., adjacent, parallel segments of active 
lines in these regions). In particular, we propose that the dual lattice be 8-connected, 
with non-zero potentials for the cliques of the form illustrated in figure 13 (a) 
and (b). The inclusion of the cliques of figure 13-b has the additional advantage 
of penalizing the occurance of sharp turns, permitting us to model the formation 
of piecewise smooth boundaries (a more general case) using a binary line process 
instead of the 4-valued process proposed by Geman and Geman. The potentials 
for these cliques are computed in the following way: 

Let V a , V b denote the potentials associated with the cliques C a , C b of figure 13 
(a) and (b), respectively, and let S k (k £ { a , b }) denote the number of line elements 
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O I o 

O I 0 
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O I o 

V Q = 0.0 

Vi = 0.25 

V 2 = 0.8 

O I o 

o I o 

O I 0 

O I 0 

o o 

O I o 

V 3 = 1.2 

V 4 = 2.0 

V 5 = 2.0 


Figure 14. Potentials for the different configurations of a line process 
belonging to C k that are "on” at a given time, i.e., 

Sk= 52 U , k — a, b 

iec k 

The potentials V k are given by: 


V h = fi+klS k ) , k = a, b (2) 


where 0 is a constant, and the functions <f> k are defined by the following tables: 
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2 


3 


S a 0 1 
<f> a 0 0.4 


0.25 1.2 


4 

2.0 


5i 0 1 2 
<}> b 0 0 10 


It is not difficult to see that this choice of potentials (notice that V a will be 
slightly different from the definition of figure 14) will effectively discourage both 
the formation of thick boundaries (Sb — 2) and the presence of sharp turns ( S a = 3 
and/or S b — 2). 


2. The potentials of the depth process, which is now continuous-valued, have to be 
modified to express the more relaxed condition of piecewise continuity (instead of 
piecewise constancy). Specifically, we propose: 



(fi - /;) 2 ( 1 - fci). 

0 , 


for \i-j\ = 1 
otherwise 



(note that / tJ - 6 (0,1}) 

3. Unlike the case of piecewise constant surfaces, we now have to worry about the 
maximum absolute difference in the values of two adjacent depth sites that we are 
willing to consider as a "smooth" gradient (and not a discontinuity). This value, 
which in general is problem-dependent, determines the magnitude of the constant 
/3 in equation (2), which can be interpreted as the coupling strength between the 
two processes. 


2.3. Model for the Observations. 


We will adopt the general model described in section 2.1 of chapter 3 to 
represent the observation process. In particular, to make the discussion more 
specific, we will assume that the observations g correspond to samples of the surface 
/ taken at a set 5 C L of sparse locations, corrupted by a zero mean, white, additive 
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Gaussian noise process: 


9i = fi + n t - 


so that the conditional distribution is: 


P/\g{9> f) = II ~— ex P[-(/i - 9i) /** ] 

\j9/KO 


our results, however, can be extended to handle other noise models as well. 
Using Bayes’ rule, we can finally write the posterior distribution as: 


p f,i\ g {f> l >9) = ~2p ex P [~ u r(f* l '> 0)] 


with 

10 

+--2 E(A - 9.) 2 + E Vi(0 + E rn (4) 

Z(J i€S Ca G b 

V a and V b are the potentials corresponding to the "a" and "6" type cliques of the 
line process, and are defined by equation (2). It is convenient to introduce a function 
q which is equal to 1 only at those sites where there is an observation, and is equal 
to zero elsewhere (i.e., q is an indicator function of the set S ): 

fl. if ies 

9i = u . (5) 

10, otherwise 

Using this function, and the definition of V from equation (3) we get: 


Up(f, ht)- t E(/< - /,f( 1 - kj) + 

E(/< - +e m +E m (6) 

ieL c* c b 
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3. Optimality Criterion. 


We can now apply the general principles developed in chapter 3 to derive the 
optimal Bayesian estimators for the depth and line fields. As a performance criterion 
we will use a mixed cost functional of the form: 

e.(/, I, /,?)= E (/.• - hf +,£(!- % - h)) (7) 

i£Lf j€.ln 

where Lf,Li denote the depth and line lattices, respectively. This error criterion 
means that the reconstructed surface should be as close as possible to the true 
(unknown) surface, and that we should commit as few errors as possible in the 
assertions about the presence or absence of discontinuities. 

Appllying the results of section 5 of chapter 3, we find that the optimal 
estimators will be the posterior mean for / and the maximizer of the posterior 
marginals for 1. Note that these estimates must be computed by averaging over all 
possible values of both / and l: 

fi = EEfiPf,o) 

f ‘ 

p u(<i) = E E TWMs) 

/ Ui=q 


4. Monte Carlo Algorithm. 

There is one serious difficulty that prevents us from applying directly the 
general Monte Carlo procedure that was derived in chapter 3 to the computation 
of these optimal estimates: since the depth variables are continuous-valued, if we 
discretize them finely enough to guarantee sufficient precision of the results, the 
computational complexity of either the Metropolis ar Gibbs Sampler algorithms 
will be very large. One way around this difficulty is to note that for any fixed 
configuration of the line field, the posterior energy becomes a non-negative definite 
quadratic form: 

u(f\i,g)= E (/.-/y) 2 + «E(/i-?y) 2 + ^ (8) 

jes 
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where a and K are constants (note that the first sum is taken only over those pairs of 
sites whose connecting line element is ’’off’, and the second one over the set 5). This 
means that the posterior distribution of the depth field is conditionally Gaussian, 
so that we can find the optimal conditional estimator /*(/) as the minimizer of (8) 
(for a Gaussian distribution, the posterior mean and the MAP estimate coincide). 
If / is identically zero (no lines), this function is strictly convex, and therefore it has 
a unique minimum. Let f* 0 be the corresponding global minimizer. For any fixed 
configuration /, the gradient of (8) is given by: 

—K ;!---- = 2 (ft - fjVij + 2 aqi(fi - g { ) (9) 


where 

Ni = {j : \i-j\ = 1} ; 

f .. — i _ /.. 

c *j — A 


Setting this gradient equal to 0, we find that any minimizer of U will be a fixed 
point of the system: 


(*) 


J 3 


(Jfc + 1) Ujf j + 


Yli£Nj ij “h 


if *ij + a( lj 7^ 

ieNj 


and ff +1 ^ = otherwise 



We note that the updating scheme (10) will produce a decrease in the value of 
U(f | /), regardless of the sweeping strategy. In a synchronous scheme (where all 
the sites are updated at the same time), the energy increment will be: 


A U(f | /) = t/(/(* +1 > | /) - t/(/W | /) = 

= -2 n E Ui +««?.)(/!‘ +1) - A k) f - 

t£L j€Ni 

- e l) uf ] - rwf - f ( r ] ) < ° 

ij OJiJ i 
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because U is non-negative definite. For an asynchronous strategy, where is 
obtained from /W by updating only the site i, we get: 


A U(f I I) = -4( £ hi + «<?.)(yf +11 - I ( i ] f < o 

JEN, 

Therefore, if we set 

/ (0) = fo (11) 


the dynamical system defined by (10) and (11) (with a given sweeping strategy) will 
be stable and have a unique fixed point f\. 

Note that, since U(f \ l) is always convex, f\ will be a global minimizer (see 
Luenberger (1973)), but in general it will not be the only one; there may be cases 
in which some region Q within which there are no observations is isolated from the 
rest of the lattice by the line process. In this case, any solution for which 

fj = constant, j e Q 

will also minimize U[f | l). However, for a fixed initial state /(°) the deterministic 
dynamical system (10) will always converge to the same solution, so that the 
configuration f*(l) is well defined. 

Let us define the set F* as: 

F * = {(/,0 : / = /,*} 


It is clear that, if f, l are the optimal estimates for our problem, we have that: 


- A 1 

l f,i)eF 


which suggests that we can constrain the search for the optimal estimators to this set. 
This can be done, in principle, by replacing the posterior energy with the function: 

U'(l) = U(fll) 
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(which depends only on /), and use the standard Monte Carlo procedures to find 
the optimal estimator /. To illustrate this idea, let us consider the following physical 
model: 

It is a well known fact the the steady state of an electrical network that contains 
only (current or voltage) sources and linear resistors will be the global minimizer of 
a quadratic functional that corresponds to the total power dissipated as heat (Oster 
et al, 1971). It is therefore possible to contruct an analog network that will find 
the equilibrium state of the depth field for a given, fixed configuration of the line 
process, i.e., that will minimize the conditional energy (8) (see Poggio and Koch, 
1984). This suggests a hybrid computational scheme in which the line field (whose 
state is updated digitally, using, say, the Metropolis or Gibbs Sampler algorithms) 
acts as a set of switches on the connections between the nodes of the analog network 
whose voltages represent the depth process. In particular, if /,• represents the voltage 
at node t, the hybrid network can be represented as a 4-connected lattice of nodes 
(see figure 15) in which: 

(i) A resistance (of unit magnitude) and a switch (controlled by the line 
element Z tJ ) is present in every link between pairs i , j of adjacent nodes. 

(ii) If an observation g { is present at site i, a current of magnitude equal to 
agi is injected to the corresponding node, which must also be connected 
to a common ground via a resistance of magnitude 1 /a (see equation 8). 

A direct application of KirchofF current law shows that at each node of this 
network we will have: 


Y (fi ~ /yX 1 ~ l ij) + = a< K9i 

jENi 

which corresponds to a fixed point of the system (10). In practice, there will always 
be parasitic capacitances which will prevent the instantaneous establishment of the 
equilibrium conditions. However, the time constant of the analog portion of the 
network may be made very fast, so that in fact, the probability distribution of the 
equilibrium states of this network will be Gibbsian with energy U*. 

This scheme can be used, in principle, to construct a special purpose hybrid 
computer for the fast solution of problems of this type. In a digital machine, however, 
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the exact implementation of this strategy will in general be computationally very 
expensive, since f* t must be computed every time a line site is updated. We will 
now present an approximation which has an excellent experimental performance, 
and leads to an efficient implementation. 

First, let us examine one iteration of the, say. Metropolis algorithm at a given 
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temperature T > 0 for the function U*. When a line site is visited and its state is 
updated, the corresponding increment in energy AUi is computed as follows: 

Suppose the line site ij is visited (the line between depth sites i and j). Let / t y 
be its current state, and I t -y the candidate state: 

t ij — -*■ i ij 

Suppose that the current state of the depth process is 

f = fi 

and let f* be the fixed point of (10) obtained when we replace / t y by 2,-y. Let us 
define: 

A * 

f = f; 
and 

av ; 7 = e v„(i) - v a (i) + e m - m 

C a :lij(zC a Cb’.lijECb 

where C a ,C b are the "a" and "6" type cliques defined in figure 13, and V a ,V bt the 
corresponding potentials. 

Since the depth process is at equilibrium, and we are changing only the element 
lij, we may assume that 

fp ^ ip for p 7^ i,j (12) 


so that 


A U x « AV-y + 


+ E 

m—t,j 


\2 


(/m f m)[ (1 Ikm) T a< 7m] 2(/ m /m)[ ^ /&(l ~ ^km) + C*<lm9m\ 

keN m keN m 

(13) 


Now, if the absolute difference |/,• - fj\ is small, / and f will be practically 
identical; on the other hand, if |/ t - - /y| is large, the changes in / at locations i and 
j will be relatively small with respect to this absolute difference. Therefore, we may 
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approximate A u] by the simple expression: 

A U] « AV;-y + (/y - - iij) (14) 

which depends only on the potentials of the cliques to which the updated line 
element belongs, and on the current state of the depth sites adjacent to it. If this 
approximation is to remain valid, the equilibrium condition on / must be mantained. 
This is done by performing M global deterministic iterations using (10) after every 
global stochastic update of the line process. We have found experimentally that the 
use of the approximate expression (14), and only three restoring iterations (M = 3) 
are sufficient to get a good convergence behavior. 

It is also possible to use assumption (12) and the fixed point condition of the 
system (10) to compute a more precise approximation to A U* (the corresponding 
formulae are derived in appendix 5.A). Our experiments indicate, however, that 
the simpler approximation (14) gives sufficiently good results, so that the increased 
complexity incurred by the use of this, more precise scheme does not seem to be 
justified. 

An important issue in the practical implementation of this procedure is the 
determination of the optimal temperature for observing the equilibrium behavior 
of the system. We have found that this can be done effectively in an adaptive way 
by starting the simulation at a relatively large temperature (say, T = 5) and slowly 
decreasing it until the network shows an adequate level of activity (measured, by the 
fraction of sites whose state is modified in one global iteration). We have found that 
a level on the order of 0.1 is adequate in most cases. This technique is similar to the 
Simulated Annealing method for finding the global minimizer of the energy, but 
in that case, the cooling of the system must proceed at a slower rate, and it should 
be continued until the level of activity is reduced practically to 0; if we proceed in 
this way, the final state of the system will correspond (approximately) to the MAP 
estimate. Note that Qmap^map) 6 F* too, so that the mixed strategy described 
above will also work in this case (see Marroquin, 1984). As we pointed out in the last 
chapter, if the signal to noise ratio is not too low, the configuration corresponding 
to the MAP estimate will be very similar to the optimal one OpmJmpm)- From 
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a computational viewpoint, however, the optimal estimator is preferable, since it 
exhibits a faster and more consistent convergence behavior. 

5. Experimental Results. 

We will now present some experimental results that illustrate the performance of 
the optimal Bayesian estimators for surface reconstruction tasks. In these examples, 
we assume that we have the following prior knowledge about the nature of the 
surfaces we are trying to reconstruct: 

(i) The region under consideration can be segmented into a small number of 
subregions. 

(ii) Within each subregion the surface is smooth (the gradient is less than 0.5). 

(iii) The boundaries between regions are piecewise smooth. There are relatively 
few corners. 

(iv) The average height of the discontinuities across boundaries is greater than 

0 . 8 . 

(v) The observations are corrupted by an additive white Gaussian noise process, 
and we have some estimate of its intensity. 

This knowledge is embodied in the model for the line process, and in the 
numerical value of the parameters. For our experiments, we have used a binary 
process with potentials given by equation (2). 

In the first set of experiments, we generated sparse observation points at 200 
random locations of a 30 X 30 rectangular grid. Figures 16, 17, 18 and 19 show 
(with height coded by grey level) the observations (a); the configuration obtained 
by interpolation with no boundaries (b); the final reconstructed surface (c), and the 
boundaries found by the algorithm (d), for: 

(i) A square at height 2.0 over a background at constant height = 1.0 (Fig. 

16). 

(ii) A triangle, with the same characteristics (Fig. 17). 

(iii) A tilted square plane (slope = 0.1) over a constant height background 
with white Gaussian added noise (<r = 0.1) (Fig. 18). 

(iv) Three rectangles at different (constant) heights over a uniform background 
(Fig. 19). 
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Figure 16. (a) Observations of a square at height 2.0 over a background at height 1.0 (a white 
pixel means that the observation is absent at that point), (b) Interpolation with no boundaries 
(c)Rcconstructcd surface.(d) Boundaries found by the Algorithm. 



Figure 17. (a) Observations of a triangle at height 2.0 over a background at height 1.0. (a 
white pixel means that the observation is absent at that point), (b) Interpolation with no boundaries 
(c) Reconstructed surfacc.(d) Boundaries found by the Algorithm. 

In many interesting cases, the observation sites are not randomly distributed, 
but rather tend to be clustered along certain curves. This is the case, for example, of 
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Figure 18. (a) Observations of a tilted square (slope = 0.1) over a background at height 1.0 
with added white Gaussian noise (<x = 0.1) (a white pixel means that the observation is absent 
at that point), (b) Interpolation with no boundaries, (c) Boundaries found by the Algorithm, (d) 
Reconstructed surface. 






i a ■ a a 




a. 


% 




( a ) (b) (c) (d) 


Figure 19. (a) Observations of 3 rectangles at heights 2.0, 2.0 and 3.0 over a background at 
height 1.0 (a white pixel means that the observation is absent at that point), (b) Interpolation with 
no boundaries, (c) Reconstructed surfacc.(d) Boundaries found by the Algorithm. 

the reconstruction of geological structures from seismic data, or of certain algorithms 
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(e) 


(f) 


Figure 20. (a) Observations of a square at height 2.0 over a background at height 1.0 with 
added white Gaussian noise (<r = 0.1). White pixels denote missing observations, (b) Interpolation 
with no boundaries, (c) Boundaries found by the Algorithm, (d) Reconstructed surface, (e) 
Perspective view of (b). (f) Perspective view of (d). 
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(c) (d) 


Figure 21. (a) Disparity data for a stereo pair of aerial photographs (data kindly provided by 
W.E.L. Grimson). (b) interpolation with no boundaries, (c) Boundaries found by the Algorithm 
(d) Reconstructed surface. 













for the reconstruction of surfaces from stereoscopic pairs of images, when the stereo 
matching is done only at the "edges" (places where the intensity gradient is large) 
detected in the images. The synthetic example of figure 20 illustrates this situation 
(here we include also a perspective representation of the recontructed surfaces, so 
that the difference between the smooth reconstruction and the optimal estimate can 
be fully appreciated). In figure 21 we illustrate a real example of this situation. It 
represents the interpolation of data obtained along the zero-crossing contours of 
the convolution of a stereo pair of aerial photographs (depicting the campus of 
the University of British Columbia) with a "Difference of Gaussians" operator, by 
Grimson’s implementation of the Marr-Poggio stereo algorithm [G4,M2]. We will 
come back to this example when we discuss the stereo matching problem in detail 
in the next chapter. 

We have also used a modified Simulated annealing scheme to get the MAP 
estimator for the same examples presented above (see Marroquin, 1984). The final 
configurations are very similar to the optimal ones, so we do not reproduce them 
here. With respect to the computational efficiency, it took, on the average, around 
450 global iterations (in a global iteration the state of the complete line field is 
updated, and the equilibrium of the depth field is restored) for the Simulated 
Annealing algorithm to converge, while for the OpmAmpm) estimator, only 250 
were needed. Also, in the latter, the behavior of the algorithm was more consistent 
in the sense that the difference in the results from successive runs with the same 
data were smaller than in the former case. 

6. A Fast Algorithm. 

The ergodicity of the "Gibbs chain" (the Markov chain generated by the 
Gibbs Sampler or the Metropolis algorithm at a fixed temperature) means that its 
time behavior mirrors the ensemble probabilistic structure. Since the probability of 
turning "on" a given line element depends on the difference in the values of the 
associated depth elements (i.e., on the current value of the gradient of the field / at 
that location), the configurations with active lines at points of high gradient will be 
generated first. These lines, in turn, will decouple the adjacent depth sites, increasing 
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the gradient even more, generating thus a positive feedback that stabilizes these 
configurations (the opposite happens in regions of low gradient, which prevents the 
formation of stable clusters of lines at those points). 

We can see, therefore, that the behavior of the Gibbs chain can be thought 
of, qualitatively, as performing in time a scale separation of the discontinuities 
of the image. This suggests the use of a deterministic scheme that performs the 
same separation, but compressing the time of the Gibbs chain. A simple way of 
implementing this idea, is to introduce a time varying coupling between the depth 
and line fields, and to allow only "downhill" moves (i.e., those with negative A U*) 
in the updating rules for the line process. Specifically, we compute the increment 
in energy associated with the update of the line element l {j at time t using: 

At/* = AKy + K{t){U - /,)%. - l {j ) (15) 

instead of equation (14), and accept the candidate state only if A U* < 0. The 
coupling strength K{t ) is computed using: 

K(t) = K 0 + ht 

( where K 0 and h are positive constants) until it reaches a given value K Ti and it is 
held constant at this value thereafter. The state of the depth process is updated, as 
before, using equation (10). K 0 must be chosen in such a way that with f — f* Q and 
U = 0 for all z, no lines will be turned "on" in the first iteration. This means that 
if we use equation (2) (with (3 = 1, and the values of (f> given in the corresponding 
tables) to compute the potentials, we must have: 

T . ^ 0.4 

K 0<— ( 16 ) 

where 

a = sup(/,- - fj) 2 
*• j 

On the other hand, the final value of K(t) (i.e., K T ), must be such that no lines are 
introduced in the smooth regions. Let 

6 = >n f (/. ~ fj? 
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C = sup(/; - fjf 
Sm 


where D is the set of neighboring pairs of sites such that each site belongs to a 
different smooth patch (i.e., pairs that lie across a discontinuity), and Sm is the 
complementary set of pairs of adjacent sites such that both sites belong to the same 
continuous patch. Kt must satisfy: 

0.25 0.25 

<K T < - 

o c 

Note that even if we do not know the precise values of a, 6 and c for a given 
problem, usually we can estimate them accurately enough to determine "safe” values 
for K 0 and Kt. The value of h controls the number of iterations needed for the 
algorithm to reach a fixed point; if h is too large and the observations are relatively 
sparse, we might get suboptimal solutions where regions with no observations are 
completely surrounded by lines, and therefore, adopt spurious constant values. We 
have found experimentally that usually 50 iterations, i.e., setting 

h K T -K 0 
50 

are enough to produce results that are indistinguishable from those produced by 
the Monte Carlo approximation. 

This scheme has an additional advantage: the optimal value of the coupling 
between the depth and line fields (the constant (3 in equation (2)) depends on the 
height of the discontinuities relative to the gradient in the smooth patches. It is, 
therefore, a free parameter of the Monte Carlo algorithm that must be adapted to 
each particular problem. Since in the deterministic scheme it is varied dinamically, 
its adaptation to each problem is automatic, provided that we choose K T and K 0 
sufficiendy large and small, respectively, so that the procedure has practically no 
free parameters. 
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Figure 22. (a) Coloring of the coupled line-depth lattice, (b) and (c) Elements whose state 
is stored in each of the two types of processors of a 4-connected parallel architecture. 

7. Parallel Implementations. 


Both the general Monte Carlo procedure of section 5 and the deterministic 
algorithm of the last section can be efficiently implemented in a parallel architecture. 
To study this implementation, we first note that the chromatic numbers (see section 
6.2 of chapter 3) of the graphs associated with the line and depth neighborhood 
systems are 4 and 2, respectively, which means that the coupled process has a 
chromatic number of 6. In figure 22 (a) we illustrate one possible "coloring”. 

The colors of the line process are represented by the numbers 1,2,3,4, and 
those of the depth process by white and black circles. The updating process can 
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be implemented in a 4-connected architecture such as the "Connection Machine", 
by assigning one processor to each depth site and its four adjacent line elements. 
We will thus have two different populations of processors, whose configurations are 
shown in figures 22 (b) and (c), respectively. 

in 

Each complete iteration consist on 6 major cycles: in the first two, the state of 
the white and black depth variables is respectively updated, and in the next four, 
the new states of the binary line variables stored in (say) the white processors are 
successively computed and transmitted to the corresponding memory locations of the 
neighboring black processors. Note that in this scheme we have some redundancy 
in the use of memory (each binary variable is stored twice), but the state of all 
the elements needed for each updating operation is always available from adjacent 
processors. 

7.1. Connection Machine Execution Time. 

The update of each depth site requires 2 (16-bit) multiplications; 5 additions 
and 10 1-bit comparisons, that is, about 600 cycles of a 1-bit processor. The 
computation of the increment in energy for the line process (equation 14) requires 
1 multiplication; 5 additions and 13 1-bit operations, that is 350 cycles. For the 
deterministic algorithm, we require 256 additional cycles for the multiplication 
by the variable coupling constant, while the exponentiation and random number 
generation needed for the Monte Carlo updating use about 2300 additional cycles 
(we assume that the updating of the coupling constant is done once every complete 
iteration in the host computer, and the new value broadcast to the whole network). 

Considering that the Monte Carlo algorithm requires about 200 iterations to 
converge, while only 50 are needed in the deterministic case, we get the following 
approximate estimates for the total execution time in the "Connection Machine" 
(using the same assumptions as in section 6.3 of chapter 3): 2.4 seconds for the 
Monte Carlo procedure, and 0.18 seconds for the deterministic algorithm. 

7.2. Analog Networks. 
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In chapter 4 we discussed the use of the "neural” networks introduced by 
Hopfield (1984) (see also Hopfield and Tank, 1985) for constructing analog systems 
that approximate the optimal estimators of binary fields. Since for a binary system, 
the TPM and MPM estimates are equivalent (see chapter 3), we can, in principle, 
replace the digital computation of the l field in the hybrid scheme discussed above 
(see figure 15) by a "neural" network that approximates the optimal estimate coupled 
with the analog "/" network (note that the switches must be replaced by analog 
devices that implement a multiplication). The time constant of the "neural" network 
has to be adjusted so that the "/" network remains in equilibrium and the search 
space is effectively restricted to the set F* (see section 4). 

To implement this idea, we must define a new energy function that depends 
continuously on /, and whose behavior is similar to U P for /,• e (0,1} (Hopfield, 
1985). One such function is: 

E(f, /) = *££ (/• - /y) 2 (! - hi) + cK £(/,• - «)* + 

»' j€Ni igs 

+«i£ £ (,( £ h- i) 2 + c 2 £i.(i-/.) + 

> 0,:.6C. t 

+«3 £ £ hij ( 23 ) 

c b :iec b jec b -{i} 


where K, a, ci, c 2 , c 3 are constants. 

Following the construction discussed in section 5 of chapter 4, we can now use 
an analog network that implements the dynamical system: 

dxii dE 
It = ~dU~ Ui 

li = O(ui) 

Where the function 0, which corresponds to the gain of the non-linear amplifiers 
that are at the nodes of the network, is as defined in equation (15) of chapter 4 (note 
that in this case the network also contains non-linear elements that act as analog 
multipliers). 
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We have performed numerical simulations of this method, and the results are 
similar to the optimal ones if the parameters of the system are selected appropriately. 
The system can be made practically data-independent by making the coupling K 
between the two networks (see equation (23)) time-varying, in the manner that 
was described in section 6. We have found that a reasonable set of values for the 
remaining parameters is: c x = .15; c 2 = .05; c 3 = 1.5. 

8. Discussion. 

In this chapter we have studied the problem of reconstructing piecewise 
continuous surfaces from sparse and noisy data. We showed that such surfaces 
can be adequately modeled by two coupled MRF’s: A depth field with quadratic 
potentials and a binary "line” field with sites in the dual lattice, and with potentials 
that represent our prior knowledge about the geometry of the curves that bound 
the smooth patches. 

We pointed out that a straightforward extension of the general estimation 
procedures derived in chapter 3 to this problem is computationally unfeasible, due 
to the continuous nature of the depth field. Therefore, we proposed a modified 
computational strategy that is based on the fact that the search space for the optimal 
estimates can be restricted to those configurations in which the depth field minimizes 
the (quadratic) conditional posterior energy for each given line configuration. The 
plausibility of this scheme was demonstrated by experimental results showing the 
reconstruction of both synthetic and "real" surfaces. 

We also derived, based on heuristic arguments, a fast deterministic algorithm 
with excelent experimental performance, and whose parameters can be made 
problem-independent, and discussed the implementation of all these procedures in 
parallel digital machines, and in hybrid and analog networks. 

It is interesting to compare the techniques we have presented with other surface 
reconstruction methods that handle discontinuities. The most successful of these 
(see Terzopoulos, 1984) are based on the idea of interpolating a smooth surface 
first and then, detecting the discontinuities by a threshold mechanism. We believe 


150 


that the method that we are proposing has some advantages over this scheme which 
justify its use in spite of the increased computational cost: 

(i) From a conceptual viewpoint, it is better to perform the interpolation and 
boundary detection tasks at the same time, rather than approximating 
an everywhere smooth surface first, since this operation hides the 
discontinuities that one then tries to find in the second phase. 

(ii) In our method, the values of the parameters depend only on the average 
height of the jumps that one wants to consider as boundaries in the 
reconstructed surface , and thus, they are independent of the location of the 
observations. If these are sparsely located, even when the discontinuity is 
relatively large, the threshold method may fail. 

(iii) A priori knowledge about the shape, orientation and position of the 
discontinuities can be easily incorporated by choice of the potentials of 
the line process. This fact makes our method particularly promising for 
integrating information from qualitatively different sources into a single 
unified estimation procedure. 

(iv) The same algorithm can be used for surface interpolation, noise elimination 
(smoothing) and boundary detection. 

We will now study a related problem: the reconstruction of surfaces from 
stereoscopic images. 
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Appendix 5.A 


HIGHER ORDER APPROXIMATION TO A U* 


In this appendix we describe a higher order approximation to the energy 
increment A U* (see section 4 of chapter 5). We will compute A U* using: 


a u ]« av; 7 + 


+ E 


(/m f ‘m)[ 53 (1 l/crn) + a( lm\ 2 {J m fm)[ 53 fk{l ~ lkm ) + a 9rn9m} 

k£Nm k£zN m 

(i) 


using the assumption: 

A 

f P ^ fp for p 7 ^ h i 

the new equilibrium configuration / can be estimated by the following formulas, 
which correspond to the fixed point of: 


f[k+ 1 ) 
J 3 


T,i£Nj + aqjgj 

Y2i£Nj ^ij T 



when / P ,p^ i, j is held fixed: 


Let: 



for k,m — i, j 
otherwise 


'Yra — 


53 ^km 4" otq m 
keNm 


The new equilibrium configuration will be a fixed point of (10), and therefore, it 
will satisfy: 



' 1 m 


E 

keN m 


t'kmfrn 4" &9m9m 


for m = t, j 
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If= o and 7 i 7 j 7 ^ 1, we get 


A 

/< 


7i7j 


T 7y 


^ y fk^ik “I" QQiQi 
,keNi-{j} 


T fk^jk T CtQjQj} 

k£Nj — {i} 


fj = — 
7j 


]C /jfc2y* + ctQjqj 
keNj 


_ A 

if i {j = o and 7 t - 7 y = 1 , it means that there are no observations, neither at i nor at 
and that these two sites are isolated from the rest of the lattice by line elements. 
Therefore, we use: 

a - ft + fj 

fi = fj = —5- 1 


Finally, if = 1, we put 


/ m = 


1 

7m 


A 

'kENm fm^km T °LqmQrn 


l/m» 


if 7m 0 
if 7m == 0 


for 


m 


'> j 


Besides, if the move from l to 2 is accepted by Metropolis criterion, we replace 


A 

fm fmi fot 771 = = 4, J 

As described in chapter 5, after all / sites have been updated, M restoring 
iterations using equation ( 10 ) of that chapter should be applied. 
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Chapter 6 


SIGNAL MATCHING 


1. Introduction. 

In all the estimation problems we have studied so far, the posterior energy 
function had the form: 


u r (f; g) = u 0 (f) + £ <&,(/,, 9i ) (1) 

i 

where U 0 (f) corresponded to the MRF model for the field /. The functions 
whose precise form depended on the particular noise model, were non-decreasing 
functions of the distance between f { and g { (see equation (2) of chapter 3): 


= - In P ni (*-\ 9 i,Hi(f)) 


There are some cases, however, when the conditional probability distribution 
of the observations P g \/{g\ f) is multimodal (as a function of /) which causes the 
functions to be non-monotonic, so that the solution to the problem remains 
ambiguous, even if the observations are dense, and the signal to noise ratio arbitrarily 
high. To illustrate this situation, we will study an important instance of it: the 
"signal matching" problem, whose one-dimensional version is as follows: 

Consider two one-dimensional, real valued sequences h L ,h Ri where h L is 
obtained from h R by shifting some subintervals according to the "disparity sequence" 
d : 

M*) — + ^*) 

with 

di€Q = {—m, —m + 1,..—1,0,1,..m} 
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The signal matching problem is to find d given h lj} h /{ . (In a more realistic 
situation, we do not observe h Lj h 1{ directly, but rather some noise-corrupted 
versions g^gu). Some interesting instances of this problem are the matching of 
stereoscopic images along epipolar lines (Marr and Poggio, 1976); the computation 
of the dip angle of geological structures from electrical resistivity measurements 
taken along a bore hole, and the matching of DNA sequences. 

To make the discussion more specific, we will consider a simple example, in 
which the sequences h L , h R are binary Bernoulli sequences; we will assume that the 
noise corruption process can be modeled as a binary symmetric channel with known 
error rate, and that d is known to be a piecewise constant function. A well known 
instance of this problem is the matching of a row of a random dot stereogram with 
density p (Julesz (I960)), when the components of the stereo pair are corrupted by 
noise. 

The stochastic model for the observations is then constructed by assuming that 
the right image is a sample function of a Bernoulli process A with parameter p : 

g H (i) = A(z) 


The left image is assumed to be formed from the right one by shifting it by a 
variable amount given by the disparity function rf, except at some points where an 
error is commited with probability e. Note that some regions that appear in the right 
image will be occluded in the left one (see figure 23). The ’’occlusion indicator” <j> d 
can be computed deterministically from d in the following way: 



if d t _jfc > di + k, for some integer k 6 (0,m] 
otherwise 



The occluded areas are assumed to be "filled in” by an independent Bernoulli 
process B. The final model is then: 


1 9n{i + di ), 

1 - 9li{i + di ), 
Bp[il 


with prob. 1 - e, if <j> d {i) = 0 
with prob. e, if <f> d (i) = 0 
with prob. 1, if <f> d (%) = l 
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i-l+d: 


i-i 


i+d 



^Lines of Constant 
Disparity 


Figure 23. Occluded Regions: The horizontal and vertical axis represent points in one row 
of tiic left and right images, respectively. Matching points arc represented by black circles. Any 
match in the shaded region will occlude the point i 

Note that in the two-dimensional case, the index i denotes a site of a lattice, and 
therefore it can be represented as a two-vector ( 21 , 1 * 2 ) whose components denote 
the column and row of the site, respectively. To simplify the notation, we will adopt 
the following convention throughout this chapter: when a scalar is added to this 
vector index (as in gn(i + c/ t ) and d { + k ), it will be implicitly assumed that it is 
multiplied by the vector (1,0) (so that the above expressions should be understood 
as gn{i + «-,0)) and <^+(*, 0 ), respectively). Using this convention, the observation 
model of equation (3) can be applied either to the one or to the two-dimensional 
cases. 

Notice that even if the observations are noise-free (e = 0) the solution of the 
problem remains ambiguous, and it cannot be uniquely determined unless some 
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prior knowledge about d (for example, in the form of a MRF model) is introduced. 
The use of a MRF model in the stereo matching case, corresponds to a quantification 
of the assumption of the existence of ’’dense solutions” (this term was introduced 
by Julesz (1960), and essentially corresponds to the assumption that the disparity d 
varies smoothly in most parts of the image; see also Marr and Poggio (1979)), and 
the use of the occlusion indicator corresponds to the "ordering constraint” (i.e., the 
requirement that if % > j, then i + d { > j + d jy see Baker (1981); we put (f> d = 1 
whenever this constraint is violated). 

2. Bayesian Formulation. 


To formulate the estimation problem, we will consider the sequence g L as 
"observations”, while g R will play the role of a set of parameters. Thus, from (3), 
we have (assuming, for simplicity that p = J): 


P[9L{i) = k | d } g R ) = P g \ d {k) = 

1 ~ e, if <j> d (i) = 0 and g R (i + rf,) = k 

e, if <f> d (i) = 0 and g R (i + d{) 7 ^ k 

2 ’ ^ = 1 

The posterior distribution P d \ g will then be; 



Pd\ 9 (d) - 


P d ' Pg\d 

~~p7~ 


1 

~B~ ex P 

r 9 


0 *,y 


• Jim - € ) S i9L{i) - 9R{i + di)) 4* 


+ £ (l - %i(>) - an(i + diMl - m) + 


where 

u , Jl. if 1 = 0 

to, otherwise 

As a prior model for the disparity field, we may use a first order MRF with 
generalized Ising potentials, such as the one presented in chapter 4. Other models 
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may also be used, including the coupled depth and line fields that we discussed in 
the previous chapters. For the present, let us assume that the simpler lsing model is 
adequate. Note that even when the matching problem is one-dimensional (we are 
asuming that there is no vertical disparity between the images, so that the matching 
can be done on a row-by-row basis), the two-dimensional nature of the prior MRF 
model for the disparity introduces a coupling between matches at adjacent rows. 
The posterior energy is: 

Ur{d; g) = ~ Y, V i d i :» d i ) “ MK 1 - e)S(g L (i) - g lt (i + d { )) + 

10 *J i 

+e(l - S(g L (i) ~ 9n(i + *))](1 - Hi)) + 

Using the fact that for any a, 6 ^ 0 

ln[a6(:r) + 6(1 — <5(a:)] = S(x) In a + (1 — 6(i)) In 6 

we can write an equivalent expression for U P (modulo an additive constant): 

Ur(d; ff) = 1- E m, H + 5 E Mi) 1" 2 + 
lQ iJ 2 t 

+ 2 - gR{i + di)) ( 4 ) 

where 

“ = ln (rb) 


3. Optimal Estimator. 

It is possible to apply the general Monte Carlo algorithms developed in chapter 
3 to approximate the optimal estimate d with respect to a given performance measure 
(such as the mean squared error). Their use in this case, however, is complicated by 
the introduction of the occlusion function <j> d in the posterior energy: the size of the 
support for this function equals the total number of allowed values for the disparity 
(see equation (2)). If this number is large, the computation of the increment in 
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energy, or of the conditional distributions (if the Gibbs Sampler is used) may be 
quite expensive. In many cases, however, the size of the regions of constant disparity 
is relatively large compared with the size of the occluded areas. In these cases, one 
can approximate the posterior energy by: 


U r (d) = 




and increase significantly the computational efficiency (we have successfully used 
this approach to reconstruct the disparity of random dot stereograms). 

In the one-dimensional case, it is also possible to extend the dynamic 
programming methods described in appendix 4.B to compute the MAP estimate; 
this extension is described in appendix 6.A. 

An alternative approach to the solution of this problem is to implement the 
local constraints, generated by the prior MRF model, directly in a deterministic 
"cooperative network" of a given form (a "Winner-Takes-All" network) whose 
fixed point will correspond to the optimal solution. This will be done in section 
6. First we present, in section 4 the definition of a "Cooperative Algorithm", and 
describe and analyze, in section 5, the previous work that has been done in this 
connection. 

4. Cooperative Algorithms. 

Consider the two-dimensional signal matching problem defined in section 2, 
and suppose that to each site i of the lattice ft we associate a set of binary variables: 
{fi,d,d € Q} (we will call this set the "i th column" of the network /, and the set: 
{/ i,d) i € ft}, the "disparity layer d" of the same network). 

If a particular variable f ijd = 1, it means that we assign to site i the disparity 
d (note that more than one disparity may be assigned to a node at a given time). 

A "Cooperative Algorithm" (Marr and Poggio, 1976; it is also known as a 
"Cellular automata"; see Wolfram, 1983) is a rule for updating the state of the 
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network /. It can be represented formally as: 


f i4 {t + 1) = F itd {f(t), t) 


with the additional requirement that the interactions should be local, that is: 


s £ Q},t ) 


where N { is the (two-dimensional) neighborhood of site i G ft. The idea is to define 
the functions F (i.e., the connections of the cooperative network) in such a way that 
the following local constraints are implemented: 

(i) Compatibility with the observations: Each element f i>r should receive 
an "excitatory" external input proportional to the conditional probability 

Pr (^W = dn{i + r ) | d{ = r). , 

(ii) Smoothness: This corresponds to an implementation of the MRF prior 
model for the disparity: the likelihood that an element f itd is turned "on" 

0-e.. is set equal to 1) should increase if the elements {f Jtd , j £ N { } are 
"on" ( N{ is the neighborhood of i in the Markov model), so that excitatory 
connections should exist between these elements. 

(iii) Uniqueness: Since in the final configuration f* one and only one element 
of each column d £ Q} should be equal to 1, each element should 
have "inhibitory" connections with the other elements of the same column. 

The operation of the network will be Synchronous if all its elements are updated 
in parallel at the same time, and Asynchronous if they are updated sequentially, 
one at a time. Note that one synchronous iteration is equivalent to |/| (the number 
of elements of the network /) asynchronous ones (we will refer to |/| succesive 
iteiations as a Global Iteration), and that the evolution of the asynchronous network 
will depend, in general, on the order in which its elements are updated. 


5. "Linear Threshold" Networks. 


The first successful application of this approach (although not formulated in 
probabilistic terms) is the algorithm developed by Marr and Poggio (1976) for the 
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stereo disparity computation. They proposed a binary network of the form: 


with pi 


fi(t + 1) = v{pi) 


Y + vi -o 


if J G fl X Q j 


(6) 


*(p) 


T, if p > 0 

< 

. 0 , otherwise 

wij satisfying w { j = for all i,j X Q 


and /,• G {0,1}, for all i 

The parameters Wij,rn and 0 must be chosen in such a way that the constraints 
to the solution of our problem are implemented locally. In particular, the smoothness 
constraint is implemented by defining: 


w x,d, y ,d = T for y G N x ; x, y £ ft 


where N x is an excitatory neighbourhood of x. The uniqueness constraint, by: 


W x,d,y,d’ — e > for (t/,cT) G M Xf d 


with M X)d an inhibitory neighbourhood corresponding to multiple matches at x (see 
Marr and Poggio (1976) for a precise definition of these neighbourhoods), and 

w x,d, y ,d' — 0 elsewhere. 


The compatibility with the observations is enforced by putting 

0 _J1, if gic(x + d) = g L {x) 

lx,d - / X,d - , . (7) 

10, otherwise 

Although it has not been possible to this date to find a rigorous proof for the 
convergence of this algorithm, numerical experiments and a probabilistic analysis 
(Marr et. al., 1978) show that the synchronous network defined above will converge to 
reasonably good solutions for random dot stereograms portraying piecewise constant 
surfaces. However, this scheme has several problems (although some modifications 
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to get around them are suggested in Marr and Poggio, 1976 and in Marr et. al., 
1978): 

In the first place, the quality of the results degrades very fast as the density of 
the tokens in the stereogram decreases. Besides, it is not clear how to extend this 
formulation to the more interesting cases of slowly varying disparities, and different 
types of tokens placed in points that do not correspond to a regular lattice. 

5.1. Asynchronous Algorithms. 

We now consider algorithms of the form ( 6 ) that operate asynchronously. In 
this case, it has been shown (Hopfield, 1982) that if we choose the parameters in 
such a way that p t - is never 0 (this can be done, for example, if wij and 77 ; are 
integers, by giving 0 a non-integer value), the "Energy" function: 

E[f) — ~9 jfifj ~ fi(v% ~ 0) (8) 

1 *»y » 


will decrease monotonically at every global iteration of the asynchronous algorithm 
in which the state of every element is updated, unless the network is at a fixed 
point. 


It is interesting to note that with the parameter definitions given above for the 
stereo problem, the term 


nfx,d X* fy,d 

* VEN X 


in ( 8 ) will be negative only if all the spatial neighbors of the cell x on the same 
disparity layer are "on", and therefore corresponds to a smoothness constraint.The 
term 



corresponds to the compatibility with the observations, and the remaining terms: 
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H fy.'l' 

y,d'E M x ,d 


may be considered as an implementation of the uniqueness constraint, since their 
minimization requires that we have as few "on" cells as possible, and it penalizes 
explicitly the local non-uniqueness of the solution. 


5.2. Experimental Performance. 

To study the performance of these algorithms, we implemented a simulator 
of both the synchronous and asynchronous networks. The ’’stimulus" used for the 
set of experiments performed, was a random dot stereogram portraying a square of 
21 X 21 elements floating at disparity -2 in front of a flat background at disparity 0. 
Figure 24 shows this stereogram and the fixed points obtained by the synchronous 
and asynchronous algorithms. 

In both cases, the behaviour of the algorithm shows two distinct phases: In the 
first iteration, most of the elements that are "on" on the wrong layers (and some on 
the correct ones) are turned "off' (see figure 24-b). As a result of this, at succeding 
iterations, the probability of having a cluster capable of growing is relatively high 
for the correct regions, which begin to fill in, and very small for the wrong ones, 
for which the remaining "on" cells are turned "off". 

This form of operation causes that the precise shape of the boundaries between 
regions will depend on the exact shape and location of the random clusters that are 
formed after the first iteration on the correct layers. Also, it is easy to see that the 
form of the inhibitory neighbourhood (see Marr and Poggio (1976)) causes the cells 
lying on wrong layers along a narrow band near the edges of the background to be on 
the average less inhibited by the "on" elements in the correct layers (which in turn 
are less stimulated) than the interior points, making thus more likely the formation 
of wrong stable clusters in these regions. This effect is more pronounced in the 
asynchronous case, since a wrong cell that is left "on", can increase the excitation 
of a neighbouring one on the same global iteration, increasing the likelihood of a 
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Figure 24. (a) Random dot stereogram portraying a 21 x 21 square at disparity -2. (b) 
State of the network after one iteration of the synchronous algorithm, (c) Fixed point for die 
Synchronous Algorithm, (d) Fixed point for the Asynchronous Algorithm. 












stable cluster, whereas on the synchronous case, all the cells of the cluster must be 
left "on" at the same time. 

For the values used for the parameters (e = 2,0 = 3.5 ; see Marr and Poggio, 
1976) the energy defined in (8) decreases monotonically at each global iteration 
of the asynchronous network, and thus, it converges to a configuration that is a 
local minimizer of this function. The correct solution will also correspond to a 
(different) local minimum; it is interesting to note, however, that in general it will 
not be the global one. It is easy to show, for example, that if the random dot 
stereogram portrays a region that has a ratio of area/perimeter less than a critical 
value (for the current value of the parameters this critical ratio is « 13), this region 
will not be distinguished from the background in the configuration that globally 
minimizes the energy. This means that the use of simulated annealing to minimize 
(8) will not necessarily improve the solution; however, we have found that after the 
deterministic algorithm has converged, a few iterations of Metropolis algorithm at 
a moderate temperature (^ 1) may be very effective for removing the clusters at 
wrong layers. This is illustrated in figure 25. 


6. Winner-Takes-All (WTA) Networks. 


Linear threshold networks are not the only form of local implementation of the 
constraints generated by the probabilistic formulation of our problem. A different 
possibility is to associate with each column {/ X)C /, d £ Q} a binary "Winner-take-all" 
synchronous network: 

The input u(x,d) to each cell corresponds to the excitatory input in the linear 
threshold case, that is, to the local implementation of the smoothness constraints 
and the compatibility with the observations. 


The inhibitory terms (the uniqueness constraint) are implemented in the form 
of a WTA mechanism. The output (the new value of f X)d ) is given by: 



if u(x, d) = max^Q u(x, d ’) 
otherwise 



165 



(c) 


Figure 25. (a) Fixed point at T = 0. (b) State after 4 iterations at T = 1. (c) Fixed point at 
T = 0 with (b) as initial state. 

This means that f x ^ will be "on” at time £ +1 only if it is maximally stimulated 
with respect to all the other elements in the same column at time t , and if it is 
’’compatible enough” with the observations (see figure 26). 
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Column 

(WTA Mechanism) 


Figure 26. Winncr-Takcs-All network (see text). 

This design has several advantages : 

L For dense stereograms, we will show that it converges to the correct solution 
in a small number of iterations. 

2. For sparse stereograms, the algorithm will give, with high probability, the 
correct disparity at every location in which a matching token is present 

3. It exhibits a good performance with natural images portraying piecewise 
constant surfaces. 

4. It is not necessary to process the whole domain n at the same time; a 
complete representation may be built up by defining local networks corresponding 
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to overlapping subregions that cover fi. This feature enables the algorithm to process 
arbitrarily large images. 

5. It can be extended in such a way that it can handle more complex situations, 
such as transparent and piecewise smooth surfaces. 

Qualitatively, this improved performance can be explained as follows: 

Unlike the linear threshold design, in the first iteration the WTA algorithm will 
only turn "ofT cells that do not lie in the correct disparity layers. This will cause 
the cells that lie at the boundaries of clusters at the wrong layers to lose, in the 
subsequent iterations, against the corresponding strongly stimulated cells that lie in 
the interior of the "correct" regions. This will result in a progressive shrinking of 
the wrong clusters, and will end up with their disappearance. 

This results in a faster convergence, since the size of the clusters that have to 
be killed is in general smaller than the size of the regions that the linear threshold 
algorithm has to fill in. Also, the boundaries between constant disparity regions will 
be more accurately localized. 

The only situation in which this behavior will not take place, is when there is 
a significant overlap (due to accidental correlations in the images) between regions 
lying at different depths. In this case, the algorithm will not be able to solve the 
ambiguity correctly based only on smoothness considerations, and it will locate the 
boundary at a position, within the region of overlap, which will depend on the 
detailed shape of this region. Also, the solution will not be so clean in this case; a 
few cells, corresponding to different disparities at the same spatial position, may be 
left "on" in the final state (limit cycles involving some of these few cells are also 
possible). 

This type of ambiguity (accidental overlap) is relatively frequent in sparse 
stereograms. However, the regions of overlap are typically "blank" regions (i.e., 
without tokens), and the algorithm will give the correct disparity at all token 
locations. 

We will now make these considerations more precise. First, we will need some 
definitions. 
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1. fi will be defined as the set of points lying on a finite square lattice. 


2. We will use a second order MRP with Ising potentials as the prior model for the 
disparity field. Therefore, for each iGfi, we define its neighborhood as: 


Af x = np|{y : 0<|x — i/| < 2} 



3. Given a region i? C fi, we define the set of its interior points (with respect to 
N x ) I(R) as the set of points in R such that all its neighbors also belong to R : 


I(R) = {x£ i?:|AT x n^| = |W X |} 


In a similar way we define: 

I%R) = I(I(R)) 

and so on. We call the points in R that are not interior: x £ R- /(«), Boundary 
points of R. We will say that a region R is connected if, given any two sites i, j 6 R, 
we can find a sequence of sites {t = i 0 , i u ..i p = j }, with t* 6 R for k = 1 ,..p, 
such that i k e N ik+l for k = 0,.. p - 1. 

I 

4. Given a region R C fi, we define its Diameter D(R) (with respect to N x ) as the 
smallest integer such that: 

I D M +l (R) = 0 

Alternatively, if we define an algorithm that deletes all the boundary points of a 
region at every step, the diameter of the region is the minimum number of steps 
necessary to completely delete the region. 

5. The initial state of the network will be given by: 

f o J 1 ’ 9r{x + d) = g L (x) 

Jx,d = , . (11) 

10 , otherwise 


6 . The WTA algorithm for this problem will have the particular form: 



if u Xl d(t) = max^Q u x>d >(t) 
otherwise 



7. We will assume that the set n can be covered by M +1 non-overlapping regions: 

n = ^U-'-U^mU 0 


and that the correct solution (i.e., the way the stereogram was generated) consists in 
assigning to every point in Ri the depth <•: 

f x ,di = 1 iff x £ ft. 

The set O corresponds to the union of all the regions that are occluded in the left 
image (see figure 23), and therefore, for every x £ O, any depth assignment will be 
considered "correct". 

8. Since we are assuming that the observations are perfect, the loading rules 
guarantee that 

= 1 for every x £ Ri 
However, in many cases we will also have: 

/2 ,4 = 1 for some x £ ft t - and dj ^ d { 

j . i , \ 

due to accidental correlations in the images. A connected set Wj defined as: 

Wj — {x : f Q xd . — 1 and x £ R { for some dj ^ <} 
will be called a wrong cluster on layer j of R{. 

9. We will say that a stereogram has well defined boundaries if there are no large 
wrong clusters overlapping the boundaries between adjacent regions. This means 
that every non-occluded point must have at least as many "on" neighbors at time 0 
on the correct layer as in any other layer, i.e., for every region R k and every point 
X £ Rk. 

E fy,d k > E fy,d for all df^d k (13) 

v€ N, y£N, 

10. A stereogram will be said to be unambiguous if for every region R{ and every 
wrong cluster Wj there is at least one point x £ Wj n Ri which has less "on" 
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neighbors at time 0 on the wrong layer dj than in the correct one d { , i.e., 

E fU < E fU (14) 

We can now establish the following result: 

Convergence Theorem: Given an unambiguous random dot stereogram with perfect 
observations (0 error rate) portraying M non-overlapping regions of constant depth 
with well defined boundaries, the WTA algorithm (12) with a >8 will converge to 
the correct solution in K iterations, where K is the diameter of the largest wrong 
cluster in Q. 


Proof: 


1) First, we note that condition (13) guarantees that all the cells on the correct layers 
(which, by (11), are "on" at time 0) will remain "on" at time 1. 

2) Condition (14) and the definition (12) guarantee that for every wrong cluster Wj 
on every region R { there will be at least one point x that will be turned "off" in the 
first iteration. Then, for all points y £ N x f) Wj f) R{ we will have: 


E / 

Z£Ny 


(l) 

z,dj 


< E / 

Z£Ny 



which implies that = 0. 

A recursive application of this reasoning establishes the theorem, i 

Remarks: 

1. For occluded regions, there will be no large clusters of "on" cells in any layer of 
/°, and since the form of (12) precludes the growth over regions with f° = 0, if 
there are any isolated points for which f° xd = l, they will remain "on" in f* (the 
fixed point of (12)); otherwise, f* = 0 uniformly over these regions. 
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2. If the algorithm has ambiguous boundaries, we can still use this theorem to 
guarantee the convergence of the WTA algorithm to the correct solution outside the 
overlap regions. It is clear that if we define new non-overlapping regions R\,.. .R' m 
with non ambiguous boundaries, and include the overlap areas in the set O, the 
theorem will guarantee that we get the correct solution in the new regions. In the 
overlap areas, the stable state of the network may include some leftover ambiguity 
(fx,d — 1 for more than one d ), and even limit cycles involving a few cells. However, 
these problematic areas will be confined to layers of unit width along the portions 
of the (final) boundaries that lie inside the overlap regions. 

3. The probability of finding wrong clusters in a binary stereogram is related to the 
probability of finding a repeated subsequence on a Bernoulli sequence of length 
equal to the total number of disparity layers, and decreases exponentially with the 
number of cells belonging to each of these clusters. For dense sterograms (generated 
by a Bernoulli process with parameter p = 1), the probability of finding a wrong 
cluster that contains a square of m cells per side can be bounded by 

Pr(cluster) < 

where N D is the number of disparity layers, and |fi| is the total number of cells in 
the lattice. On the other hand, a cluster of diameter k must contain at least a square 
of side 2k +1. Thus, if Np — 7 and |H| = 64 2 , for example, we can guarantee that, 
for dense stereograms, the algorithm will converge to the correct solution in less 
than 3 iterations with probability > 0.99. 

4. For sparse stereograms, wrong clusters involving only ’’blank" areas will be very 
common, but those containing active tokens will be rare. This fact, together with 
remark 2, mean that, with high probability, tha WTA algorithm will find the correct 
disparity at all the sites that have active tokens. This has been confirmed by our 
experiments. 

5. Algorithm (12) will not grow regions into occluded (uncorrelated) areas. 
Psychophysical experiments show that these areas should be included with the 
adjacent region that is at the greatest depth. It can be verified that an algorithm 
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such as the following: 


fx,d(t +1) 


i, if E y e/v x /y,a(0 > 2 /x )C i’(0 E y e/v, f y ,d'(t) , d’ d 
10, otherwise 


with / Xfrf (0) = / X|rf (the fixed point of (12)), will converge to a solution in which 
these regions are correctly filled in, provided there are no wrong clusters in the 
occluded regions, and that each layer of constant d is allowed to converge separately, 
starting with d = dmin — rain [d £ Q). 


6. Note that even when (xj, x 2 ) £ Q, (x\ + d, x 2 ) may lie outside n and so, if we load 
the network using (11), some cells near the boundaries of 0 may remain undefined, 
and (12) may give incorrect results. Therefore, we implicitly assume the existence 
of a larger region fi 0 Dfi such that for all x £ n, f° yd is defined for y £ N x U{x} 
and deQ. Also, the operation of (12) should be understood in a modified sense, 
so that f x>d (t) = f Q xd for all x £ H 0 - O, all deQ, and all t. 


A useful corollary establishes that it is not necessary to process all fi at the same 
time, but that a complete representation can be built up by defining local networks 
corresponding to windows S C Q, provided that there is enough overlap between 
them. In particular, we will show that if we load the local network S in such a way 
that its initial state coincides with the initial state of the complete network at those 
cells, and if the algorithm operates only on the interior points of S, keeping the 
state of the boundary points fixed, then the final state of the local network at these 
interior points will correspond to the optimal solution: 

Let d) and f l s (x } d) be the state of the (x, d) cell at time t in the complete 
and local network respectively. We have: 

Corollary 1: Suppose the conditions of the convergence theorem hold in 0, and 
consider a set S C fi such that the stereogram is not completely ambiguous in 
Si = J(S) (i.e., condition (14) holds for every x £ Si). Suppose that we load the 
local network f s in such a way that for every x £ S,f° s {x,d) = f^{x,d), for all 
deQ. 
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Then, algorithm (12), modified in such a way that fs{x,d) = f" s {x,d) for all 
t , all x £ S — S i, and all d £ Q, will converge to a fixed point f* s for which 
f* s {x,d) = fu(x,d) for all x belonging to unoccludcd regions inside Si. 

Proof: 


Consider a region R of constant disparity d such that R' = RftSi ^ 0, and let 
Bi be the intersection of R with the boundary of S\. For every point x £ R’ — B[, 
fs{x,d) = 1, by the same arguments as in the convergence theorem. For x £ B { , 
fs{x,d) = 1 too, since fs{y,d) = fn{y,d) for y £ N x , and (13) holds in n. 
Therefore, for every x £ R\ f' s (x, d) = 1 

On the other hand, for any wrong cluster W d > C i?’ in layer d } ^ d , since the 
stereogram is not completely ambiguous inside Si, there will be at least one point 
x £ W# such that f l s {x, d’) = 0. Reasoning as we did before, we have that for all 
points y £ N x \~) W# fi R ’ we will have: 


E / 

z£Ny 


( 1 ) 

z,dj 


< E / 

z£Ny 


( 1 ) 

z,di 


which implies that f^ d . = 0. 

Applying this reasoning recursively, we get, for every x £ R\ that f* s {x, d) = 1, and 

fsi x > d ') = 0, d’ d, which, together with the convergence theorem, completes the 

proof, i 

Note that 5 — Si defines the overlap that should exist among local windows, so that 
the complete representation, defined by 

n = u4 ,) 

i 

is correctly formed. 

6.1. Numerical Results. 
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To test the performance of algorithm (12) with random dot stereograms, a 
simulator was implemented in a Symbolics 3600 computer. Figure 27 shows the 
fixed points corresponding to dense and sparse stereograms portraying a pyramid. 
As predicted by the theory, the convergence to the correct solution is fast (less than 
4 iterations) in both cases. In the case of the sparse stereogram, the boundaries are 
slightly misplaced, but, as can be verified by direct inspection of the stereogram, 
all the dots are correctly located. The fixed point corresponding to the synchronous 
operation of (6) (obtained after 11 iterations) is also presented, for comparison. As 
we can see, the WTA algorithm (12) converges much faster to a much more precise 
result. 

7. Recontruction of Real Images. 

To apply this algorithm to the processing of real images, there are some 
modifications and extensions that should be made. 

7.1. Neighborhood size. 

It is possible to increase the robustness of algorithm (12) with respect to the 
presence of noise in the images by increasing the size of the excitatory neighborhood 
(i.e., by postulating a more global MRF prior model) and decreasing the value of 
the parameter a. This increased robustness is traded off by a decrease in resolution: 
small correct regions may be trated as ’’noise", and therefore disappear from the 
solution. Also, the shape of the piecewise constant regions may be altered (corners 
may be rounded and small concavities "filled in"). 

7.2. Token Selection. 

The simple rule (11) is adequate for measuring the compatibility with the 
observations in the case of a synthetic image (such as a random dot stereogram). 
However, it will not work in the case of continuous-toned images of real objects. 
The reasons for this failure are manifold: the distribution of the reflected light 
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(e) 


Figure 27. (a) Dense Stereogram (density = 0.4) portraying a pyramid, (b) Fixed point for 
algorithm (12) (c) Sparse stereogram (density = 0.1) portraying a pyramid, (d) Fixed point for 
algorithm (12). (c) Fixed point for the Synchronous algorithm (6). 
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varies as the viewpoint is changed (particularly the specular component), and the 
two retinas (cameras) may have different point spread functions, and be affected 
by independent sources of noise. This means that the model for the observation 
process given by equation (3) should be replaced by another that reflects the process 
of formation of natural images in a more realistic way. The use of a better model 
will cause the term f Q xd in equation (12) to be replaced by a different compatibility 
measure rj Xtd which is obtained by first preprocessing the right and left images using 
an operator T whose output should be, ideally, invariant under the changes in 
viewpoint, optics, etc., and then computing a suitable defined distance D between 
the two processed images: 


Vx,d = D(Tg R (x -f d), Tg L (x)) 



(note that rj may be continuous-valued). 
The new WTA algorithm will be: 



if u Xtd (t) == max^Q u x>d ’{t) 
otherwise 


u x ,d(t) = arj Xfd + PnifM, x, d) 


( 16 ) 


The operator P N is generated by the enlarged MRF model, and in general it will 
represent a weighted average of the values of the field in the enlarged neighborhood: 

P N {f,x, d) = £ c(\x — y\)f x ,d (17) 

yen. 


where N x is the extended neighborhood of x and c(-) denotes a set of parameters 
that depend only on the distance \x - y\, and are related to the prior MRF model 
for the disparity. f° may be chosen as: 


-O fl» if *7x ,d — max r(EQ Vxyt 

x,d lo, otherwise 

The convergence of this modified algorithm to the correct solution can still be 
guaranteed if condition (13) is replaced by the requirement that the cell corresponding 
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to the correct layer of every non-occltided point should be maximally stimulated at 
time 0, with respect to the other cells in the same column, by neighbors belonging 
to the same constant disparity region: 

ar ix,di + P { kf°, X, di) > aVx , d + P N (/°, X, d) (18) 

for every region every x 6 R, and every deQ. is the operator P N restricted 
to Ri'. 

p { 3(f,x,d)= £ c(|x — y|)/ M 

y£N x f)R { 

(this modification is necessary to cover the case in which a point near the boundary 
of a constant disparity region is partially stimulated by a wrong cluster outside this 
region which may disappear in succeeding iterations). 

Condition (14), i.e., the requirement that every wrong cluster has less ’’on" 
neighbors at time 0 on the wrong layer than in the correct one, can now be expressed 
by requiring that for every region R { and every wrong cluster Wj on layer j of R it 
there is at least one point x e Ri fl Wy such that: 


P N (f°,x,d j )<P N (f°,x,d i ) (19) 

Under these conditions, it is easy to use the same arguments of the proof of 
the convergence theorem to verify the convergence of algorithm (16). It should be 
remarked that conditions (18) and (19) are sufficient, but by no means necessary; 
(16) may converge to the correct solution even if they are violated by a particular 
stereogram. 

The determination of the optimal operators D and T in equation (15) is a 
difficult — and as yet unsolved problem. One scheme that has often been used is 
to define T as a convolution operator whose kernel is the Laplacian of a Gaussian 
function , and T as: 

T(a, b) = f ’ 

lo, 
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if ab > 0 

otherwise 




(sec Marr and Poggio, 1979). The rationale for this choice is that the zero crossings 
of the convolution with the Laplacian operator should pick the places where large 
intensity changes occur in both images (i.e., it acts as an "edge detector"), while the 
Gaussian kernel has the effect of smoothing out the "irrelevant" edges and filtering 
out the noise. One difficulty, however, is that if the Gaussian mask is large enough 
to produce the desired effect, it will also introduce errors in the localization of the 
zero crossings of the convolved images, which will translate into errors in the depth 
of the reconstructed surface (see Clark and Lawrence, 1985). 

We have found that the normalized absolute value of the Laplacian of the 
difference between left and right images: 

— v(x, d) + max r gQ v(x, r) 


Vx t d 


ma x r £Q v(x, r) — min re Q v(x, r) 


with 


v{x, d) = |V 2 (<7*(x + d) - g L (x ))| 


( 20 ) 


has relatively good experimental behavior, but clearly, much more research is 
needed in this area. 

It is important to note that the definition of rj will affect the performance of the 
WTA algorithm, since it will determine the extent to which conditions (18) and (19) 
hold in the initial state; the structure of the WTA network, however, is independent 
of the choice of rj, so that the experimentation with different definitions can be 
done very efficiently. 


7.3. Uniqueness Constraint. 

The definitions (12) and (16) imply the enforcement of the constraint: 

"Each point in the left image should be matched by only one point in the right 
image". 

That is to say, we are enforcing the uniqueness constraint along the left eye 
line of sight. It is also possible to include explicitly the corresponding constraint for 
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the right eye (as in Marr and Poggio, 1976). This is done by replacing (16) (or (12)) 
with: 

I T if u X)d (t) = mnx (re Q u Xfd >(t) 

and u x>d = max k:d+keQ u x -. kfd +k 
0, otherwise 

For perfect observations, this additional constraint is redundant. If noise or other 
distortions are present, however, this scheme will have better performance, since the 
disparity of "doubtful” points will be left unassigned (the corresponding values of 
the disparity in these locations may be determined after convergence by the robust 
surface reconstruction techniques described in chapter 5). 

As an example of the application of this technique, the processing of a stereo 
pair of aerial photographs is illustrated in figure 28 (this stereo pair is the same that 
was used in chapter 5; see figure 19). Although it is difficult to assess objectively 
the performance of an algorithm on this type of images, the quality of these results 
seems at least equivalent to that obtained by state-of-the-art systems (see Grimson, 
1984). 

7.4. Piecewise Smooth Surfaces. 

The WTA scheme can also be applied to reconstruct disparity surfaces that 
are piecewise smooth. To do this, it is only necessary to modify the definition of 
the operator P N (equation (16)), so that cells at nearby depths are also taken into 
account. Notice that, in order to be consistent with the WTA mechanism, only the 
maximum contribution for any given column should be considered. The modified 
operator is: 

P N {f i x ) d)= - i/|, \d-r\)fy }r } (21) 

yeN x 

where c(-, •) is some fixed decreasing function of its arguments, and N d is a disparity 
neighborhood defined as the intersection of a closed interval with the set of allowable 
disparities: 

Nd = [d - p, d + p] f) Q 
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Figure 28. (a) Stereo pair of aerial photographs, (b) Final state of the WTA network (disparity 
is coded by grey level; white areas have no assigned disparity), (c) Reconstructed sufacc, obtained 
using die algorithm described in section 6 of chapter 5. 
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where p is a positive constant. 

The sufficient conditions for the convergence of the modified algorithm, namely, 
that the stereogram should be unambiguous and have well defined boundariesWith 

. t 

respect to the the modified operator P N , can also be expressed in the form given 

■ i ■ 

by equations (18) and (19), but now a wrong cluster W 3 should be defined as a 
connected region on the disparity layer d 3 such that f° x>d . — 1 , and d 3 ^ d*(x) for 
all x 6 Wj, where d*(x) is the true disparity at point x. The proof of the convergence 
theorem is straightforward, but the interpretation of these conditions is not obvious, 
and in practice, they are very difficult to verify, so that at this point, the performance 
of this algorithm should be assessed experimentally. 

Pradzny (1984) (see also Pollard et. al., (1984)) has obtained good results for 
the reconstruction of piecewise smooth and "transparent" surfaces (i.e., stereograms 
portraying sets of small interspersed patches that belong to two smooth surfaces, 
one in front of the other) using an operator of the form: 

P N {f, x, d) = Y Y { c (\ x ~ v\> \ d ~ r \)fy,r} 

y£N a rE.Nd 

We believe that the use of (21) should improve the performance in these cases. 

8. Discussion 

In this chapter we have studied a class of recontruction problems that arise 
when the conditional distribution of the observations is a multimodal function, 
which causes the solution to remain ambiguous, even for arbitrarily high signal to 
noise ratio. We identified the signal matching problem as one of the most important 
instances of this class, and gave a probabilistic formulation for it using a MRF 
model to model the disparity surface, so that the optimal estimation algorithms 
derived in chapter 3 could be applied. 

We then presented a different approach to the solution of the problem in 
which the constraints derived both from the prior MRF model for the disparity 
field and from the observations are implemented directly as excitatory connections 
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on a three-dimensional cooperative network of processors (or "cells") with binary 
state space. The steady state of this network can be unambiguously interpreted as a 
disparity surface only if there is exactly one processor in each column whose state 
is equal to 1. This imposes a uniqueness constraint which can be enforced cither 
by introducing inhibitory linear connections, or by a "Winner-take-all" mechanism 
that operates within each column. We showed that, for high signal to noise ratio, it 
is possible to define precise sufficient conditions (which are usually met in the case 
of synthetic images) for the convergence of the state of this "WTA" network to the 
correct solution in a small number of iterations. 

The experimental performance of this algorithm with random dot stereograms 
is excellent; it produces accurate reconstructions in a very short time (in less than 5 
iterations). In the case of the reconstruction of real objects from stereoscopic 
photographs, this algorithm —■ with some modifications — produces results 
comparable with those obtained by more complicated schemes that are considered 
"state of the art", and it has the advantage of being directly implementable in 
parallel hardware. 

It should be noted that the performance of the stereoscopic vision of human 
beings on similar data is still dramatically superior to that of this, or any other 
existing artificial system. Some issues that should be addressed for the development 
of more effective algorithms are the following: 

(i) More realistic models for the observation process that take into account 
the nature of the relative distortions of the left and right images should be 
constructed. This should lead to the definition of optimal combinations of 
tokens for the matching process. The precise nature of the optical system 
used (which may cause problems like non-horizontal epipolar lines; vertical 
disparities, etc.) should also be taken into account. 

(ii) The use of more sophisticated prior models for the disparity field — 
including a coupled line field as described in chapter 5 — should be 
investigated. 

(iii) Since the intensity edges and the regions of uniform intensity (or uniform 
texture) of the images are natural candidates for becoming stereo matching 
tokens, and the location of depth and intensity (or texture) edges is likely 
to be correlated in a natural scene, the integration of edge detection; 
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image segmentation; stereo matching and surface reconstruction into a 
single estimation process should produce very good results. The Bayesian 
approach, and the use of coupled MRF models for describing surfaces 
and edges that we have presented in this thesis should provide a unified 
framework for performing this integration. We discuss this point further 
in the next chapter. 
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Appendix 6.A 


DYNAMIC PROGRAMMING APPROACH TO SIGNAL MATCHING 


Consider the one-dimensional version of the signal matching problem described 
in section 2. To compute the MAP estimate, we need to find the global minimum 
of: 

Ui>{d ; g) = — V i d i> d j) + ^ Z) M *) ln 2 + 

io »\j 2 * 

+~ SC 1 _ tdifytigdi) - gri{i + d i)) (i) 


(i.e., equation (4)) The use of the dynamic programming algorithm described in 
appendix 4.B is complicated by the fact that, given the boundaries L n between 
regions of constant disparity, the optimal estimate for d in the interval (L t -, Z/ t - +1 ] 
depends on the estimate on L t ], since this last choice determines the extent 
of the occluded region. 

However, if we assume that the size of the regions of constant disparity is 
relatively large compared with the size of the occluded areas (as it normally happens 
in most practical cases), we can estimate d given L n using the formula: 


d((L i} L i+l ]) = 

= {k: £ %lM ~ 9R[i + k)) < ~ 9n{i + /)), for all l G Q} (2) 

i—Li+l i=Li+l 

Defining: 

G k,l = Z ~ 9lt{i + d{{k, /]))) (3) 

t=fc+l 


and 



= max(0, — di) 
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(note that A t - corresponds to the length of the occluded region when a change in 
the estimated disparity occurs), we get that 


Ur{d ]g ) = ~ + U{L) 


with 

U{L) — G\ } i x + Ai In 2 + <3 li+Ai + i,l 2 + 

+A 2 In 2 + ... + Gj Jn+ A n +l,N 

We can now perform the global minimization 0 fU using the dynamic programming 
scheme of appendix 4.B. In this case, however, it is better to use "forward” 
recursions, (in the sense that now Fj(k) will represent the cost associated with 
putting j boundaries, in the best possible locations, in the interval [1, k}\ because 
occlusion, as we have defined it, always takes place from left to right. We have 
then: 


Fo(k) - G i >jfe 
L 0 {k) = 1 

F J+ i(k) = mf {Gi+a.+i k + Fj(i) + Ay In 2} 

t <k 

Lj+i(k) = {L:Gi+& i+ i f k + Fj(L) + Ay In 2 = Fj +{ (k)} 

The optimal location of the boundaries, for any given n is: 

S n = (T„(iV), L n -i(L n (N )),..., U{L 2(.. .[L n {N ])...)} 

The optimal configuration is computed using (2), and the corresponding energy, 
using (1). 

Note that as the size of the regions of constant disparity decreases, (2) may not 
be well defined (the optimal estimate d may not be unique) and a more complex 
optimization procedure may be required. 
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Chapter 7 


CONCLUSIONS 


In this thesis we have presented a probabilistic approach to the solution of 
a class of perceptual problems. We showed that these problems can be reduced 
to the recontruction of a function on a finite lattice from a set of degraded 
observations, and derived the Bayesian estimators that provide an optimal solution. 
We have also developed efficient distributed algorithms for the computation of these 
estimates, and discussed their implementation in different kinds of hardware. To 
demonstrate the generality and practical value of this approach, we studied in detail 
several applications: the segmentation of noise-corrupted images; the formation of 
perceptual clusters; the recontruction of piecewise smooth surfaces from sparse data 
and the reconstruction of depth from stereoscopic measurements. 

This methodology also permits, in principle, the incorporation of more than one 
modality of observations into a single estimation process, as well as the simultaneous 
estimation of several related functions from the same data set. This makes one hope 
that this framework could be useful in the solution of difficult problems that require 
such an integrated approach. We mention two examples: 

1. We mentioned in chapter 6 that the stereo matching problem in real situations 
has not been solved yet in a satisfactory way. The same can be said of other related 
perceptual problems such as: edge detection; image segmentation; the recovery 
of the shape of an object from a single two-dimensional image (the "shape form 
shading" problem), and the segmentation of a scene into distinct objects, as well 
as the recovery of their three-dimensional structure from the analysis of images 
formed at successive instants of time (the "structure from motion" problem). All 
these problems are obviously related, and it is intuitively clear that the individual 
solutions that can be obtained should improve if the mutual constraints that the 
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solution of each individual problem imposes on the others were taken into account. 
Thus, the presence of a brightness edge should increase the likelihood of a depth 
edge, and viceversa; the depth estimated from stereo should be compatible with 
the shape derived from shading; points belonging to the same region in an image 
should move together, etc. We believe that these constraints can be incorporated 
in the potential functions of the corresponding MRF models (in particular, of the 
coupled fields that represent the 'Mines" or edges in each case; see chapter 5). 

2. The processing and interpretation of geophysical information (as is done, for 
example in oil prospecting) attempts to reconstruct subterranean geological structures 
from information provided by a set of qualitatively different measurements, such as 
those obtained by: gravimetric and magnetometric surveying; reflexion seismology; 
measurements of physical properties taken vertically along bore holes ("well logs"), 
etc. Since all these measurements are obtained independently, their joint conditional 
probabilities can be easily determined, and since all of them refer to the same 
physical structures, their processing can, in principle, be integrated into a single 
estimation process, which should greatly increase the reliability of the results. 

The above considerations may be taken one step further. Ultimately, the results 
one is interested in are not only the quantitative reconstruction of some surfaces, but 
the symbolic description of the scene in terms of functional structures or "objects". 
On the other hand, the prior knowledge about the occurance of a particular object 
or class of objects might greatly simplify the tasks of the "low level" processors 
(for example, a letter recognition algorithm should greatly benefit from the use 
of context, given the probabilities of occurance of certain letter combinations or 
words). The Bayesian approach provides a common "language" that may allow 
these low-level and high-level (or symbolic) processes to communicate and mutually 
enhance their performance. 

As a simple example of this situation, suppose that we are interested in finding 
a symbolic description of a binary pattern / in terms of a set of geometric objects 
(such as squares, triangles, etc.) that are characterized by some parameters (such 
as position, orientation, size, etc.) for whose values we have some prior probability 
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knowledge. 


Given a description P, i.e., a list of objects with a set of particular values for 
their parameters, we can find a binary field q which corresponds to the boolean sum 
of the indicator functions of the objects included in P: 


qi = 



if an object in P covers pixel i 
otherwise 


We can now write the joint prior distribution for the field / (which represents the 
actual intensity of the noise-free image) and its description as: 

P(f, P) = P[f | P)P(P) 


To compute P(f | P ), we assume that / is a first order MRF whose configuration 
is biased by P: 

1 V I P) = 4 exp[-™ J2 V (fi’ fj ) + X X) «/»] 

** -to ij i 

P{D) can be computed from the prior probabilities for the occurance of each type 
of object, and from the prior distributions for the values of the corresponding 
parameters. Since the conditional distribution of the observations depends directly 
only on /, the posterior distribution will be: 

where P(g) is a constant From this expression we can compute the optimal estimates 
for / and D using methods similar to the ones developed here. 

We will now present a summary of our main results and a list of some interesting 
open technical questions. 


1. Summary of our Main Results. 


1.1. Optimal Bayesian Estimators. 
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Several researchers have used Bayes theory and MRF models for the restoration 
of piecewise uniform images. It has been implicitly assumed by all of them that the 
maximization of the posterior probability is the best possible performance criterion. 
We have shown that it is possible to choose other criteria that are better adapted 
to each particular problem, and have derived the corresponding optimal estimators, 
which not only improve substantially the quality of the results (particularly for 
low signal to noise ratios), but also lead to more efficient and better behaved 
computational schemes. 

1.2. General Monte Carlo Algorithms. 

We have shown that the optimal Bayesian estimators can be obtained from 
the observation of the equilibrium behavior of a MRF (which in physical terms 
correspond to a ferromagnet subject to a spatially varying external magnetic field). 
This behavior can be effectively simulated by Monte Carlo procedures which 
generate a regular Markov chain with an invariant Gibbs measure. 

This method differs from "simulated annealing" (which has been used to 
approximate the MAP estimator) in that it is based on the collection of statistics of 
the evolution of the chain at a fixed temperature, while the latter attempts to find the 
ground state of the coupled system by slowly decreasing it. From a computational 
viewpoint, our method exhibits a faster and more consistent convergence behavior. 

1.3. Parallel Implementations. 

The implementation of this general Monte Carlo procedure in parallel hardware 
was discussed. We proved that the Gibbs sampler (but not the Metropolis or Heat 
Bath algorithms) will produce consistent results in this case. 

1.4. Reconstruction of Piecewise Constant Funcions. 

The problem of reconstructing a piecewise constant function from noisy 
(but dense) observations was formulated in probabilistic terms, and the form of 
the optimal estimators derived. For the one-dimensional case, we presented a 
deterministic algorithm with minimal complexity which computes (exactly) the 
MAP estimate of binary fields. For the two-dimensional case, we presented a 
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method for improving the computational efficiency of the "Simulated Annealing" 
scheme for approximating the MAP estimator, and derived a fast algorithm for 
approximating the optimal (MPM) Bayesian estimator. 

We also presented a maximum likelihood procedure, which based on an analysis 
of the residual ("innovations") process permits the simultaneous estimation of the 
field and the parameters of the system. We applied this technique to the construction 
of a parameter-free algorithm for the reconstruction of arbitrary binary patterns. 

1.5. Formation of Perceptual Clusters. 

We suggested that the process of formation of perceptual clusters of certain 
dot patterns can be modeled in terms of the estimation of binary images corrupted 
by multiplicative noise, and illustrated the application of our estimation algorithm 
to this task. 

1.6. Reconstruction of Piecewise Continuous Surfaces. 

The problem of simultaneously detecting the discontinuities and recontructing 
a piecewise smooth surface from sparse observations was cast in the Bayesian 
framework. A model consisting of two coupled MRF’s: one representing the 
depth and the other the boundaries between continuous regions, was adapted to 
our problem. Since the straightforward use of the general Monte Carlo algorithm 
for finding the optimal estimate is computationally unfeasible in this case, an 
approximation (which showed an excellent experimental performance with both 
synthetic and "real" data) was derived and implemented. We also developed, and 
heuristically justified a fast algorithm that produces results that are practically 
indistinguishable from the optimal ones. The implementation of these procedures 
in digital parallel hardware, as well as in hybrid and analog networks was also 
discussed. 

1.7. Signal Matching. 

We presented a class of problems that is characterized by the fact that the 
conditional probability distribution of the observations P(g | /) is multimodal (as a 
function of /), which means that the solution remains ambiguous, even for arbitrarily 
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high signal to noise ratios. We studied a prototype problem of this class: the signal 
matching problem (in particular, the reconstruction of depth from stereoscopic pairs 
of images), and showed that it is possible, in principle, to find the solution using the 
general estimation procedures that we have developed (although the computational 
cost will be high in the general case). We also presented a different scheme, which 
is based on the direct implementation of the local constraints (generated by the 
probabilistic model) in a highly distributed cooperative network of a particular 
form: a "Winner-Take-All" network, and showed that the state of this network 
will converge to the correct solution in a few iterations (in the high SNR case). The 
application of this technique to the reconstruction of the depth of real objects from 
stereoscopic photographs was discussed, and some modifications to the algorithm 
were introduced, which permitted us to produce results which compare favourably 
with those of other "state of the art" algorithms. 

2. Open Technical Questions. 

2.1. Stochastic Models. 

We have shown throughout this work the richness and versatility of simple (first 
and second order) MRF models. It is clear,however, that there are classes of physical 
structures whose behavior cannot be adequately modeled by these processes (as a 
simple example, consider images formed by clusters of blobs of certain average size). 
There have been some attempts to model these and other "textured" patterns via a 
hierarchy of independent MRF’s: one that represents the structure of the image, in 
terms of regions of uniform texture, and individual models for each textured regions. 
This representation, however, is not very convenient for estimation purposes. A 
more rigorous approach has been suggested by Grenander (1984), who has proposed 
the use of generalized Markovian fields to model complex patterns; these fields 
consist of several layers of "generators", which in the first layer correspond to 
grey levels, and in the succeeding ones, to features of increased complexity (lines, 
corners, etc.). It is not clear, however, how to use this approach to construct models 
of textured images; objects of different shapes, etc. 
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These considerations suggest the need for much more research in this area, 
which should include, perhaps, the use of probabilistic models that are not based 
on the Gibbs distribution. 

2.2. Multiple Scale Representations. 

It is the current view that the production of high-level (symbolic) descriptions 
of a scene should be mediated by the construction of numerical descriptions of 
the surfaces involved at different "scales". The parameters that describe a MRF 
play in some sense the role of scale parameters (see figure 1 of chapter 1; section 
5 of appendix 4.B and section 6 of chapter 5); this identification, however, is not 
completely satisfactory. A good multiscale representation should feature not only a 
progressive blurring of detail, but the aggregation of substructures into larger units 
in a way that is not accomplished by the current algorithms. 

2.3. Parameter Estimation. 

Intimately liked with the previous questions, is the determination of the optimal 
set of parameters of a given model from noisy samples. The maximum likelihood 
method that we have presented here (see chapter 5) becomes computationally 
unfeasible as the complexity of the model (the dimensionality of the parameter 
space) increases; therefore, alternative procedures need to be derived (for instance, 
the use of time-varying algorithms, such as the one presented in section 6 of chapter 
5 should be more rigorously investigated). 

A related (and more difficult) question is the selection of the optimal model 
from a certain class given only the noisy observations. It is possible that the ideas of 
Rissanen (1978,1981,1983) about "minimum description length" schemes, and also 
of Akaike (1977) about generalized maximum likelihood methods could be useful in 
this connection, although the high computational complexity of the present problem 
might limit the applicability of these techniques. 

2.4. Fast Algorithms. 

The practical use of the general Monte Carlo estimation algorithms of chapter 
3 is limited by the relatively large number of iterations needed for the convergence 
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of these systems. A very important question, then, is how to improve on the 
convergence time without sacrificing the power of these methods. The use of 
"multigrid” type strategies (Brandt, 1973; Terzopoulos, 1984), which in the present 
case may take the form of "block-spin”” algorithms, such as the one presented in 
chapter 4 (see also White, 1983) should be investigated. 

Also in this connection, it should be interesting to find more rigorous justifications 
for the performance of the fast deterministic schemes that we have developed, based 
on heuristic considerations, in chapters 4 and 5, to see if it is possible to find some 
general principles that may guide the extension of these schemes to other, more 
general cases. 

2.5. Analog Computers. 

It would be interesting to actually construct prototypes of the hybrid and analog 
networks described in chapter 4 and 5, to assess the practicality and performance of 
such schemes. A more intriguing possibility is to exploit the isomorphism between 
the estimation process of a MRF from noisy data, and the equilibrium behavior 
of a ferromagnet with a coupled, spatially varying external field (see chapter 3), to 
construct very fast, special purpose "quantum” computers to perform the former 
task. 
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